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Preface 


This book is designed for college students who want to teach mathematics in 
high school, but it can serve as a text for standard abstract algebra courses as 
well. First courses in abstract algebra usually cover number theory, groups, 
and commutative rings. We have found that the first encounter with groups is 
not only inadequate for future teachers of high school mathematics, it is also 
unsatisfying for other mathematics students. Hence, we focus here on number 
theory, polynomials, and commutative rings. We introduce groups in our last 
chapter, for the earlier discussion of commutative rings allows us to explain 
how groups are used to prove Abel’s Theorem: there is no generalization of the 
quadratic, cubic, and quartic formulas giving the roots of the general quintic 
polynomial. A modest proposal: undergraduate abstract algebra should be a 
sequence of two courses, with number theory and commutative rings in the 
first course, and groups and linear algebra (with scalars in arbitrary fields) in 
the second. 

We invoke an historically accurate organizing principle: Fermat’s Last The- 
orem (in Victorian times, the title of this book would have been Learning Mod- 
ern Algebra by Studying Early Attempts, Especially Those in the Nineteenth 
Century, that Tried to Prove Fermat’s Last Theorem Using Elementary Meth- 
ods). To be sure, another important problem at that time that contributed to 
modern algebra was the search for formulas giving the roots of polynomials. 
This search is intertwined with the algebra involved in Fermat’s Last Theo- 
rem, and we do treat this part of algebra as well. The difference between our 
approach and the standard approach is one of emphasis: the natural direction 
for us is towards algebraic number theory, whereas the usual direction is to- 
wards Galois theory. 

Four thousand years ago, the quadratic formula and the Pythagorean The- 
orem were seen to be very useful. To teach them to new generations, it was 
best to avoid square roots (which, at the time, were complicated to compute), 
and so problems were designed to have integer solutions. This led to Pythag- 
orean triples: positive integers a, b, c satisfying a 2 + b 2 = c 2 . Two thousand 
years ago, all such triples were found and, when studying them in the seven- 
teenth century, Fermat wondered whether there are positive integer solutions 
to a n + //' = c" for n > 2. He claimed in a famous marginal note that there 
are no solutions, but only his proof of the case 77 = 4 is known. This problem, 
called Fermat’s Last Theorem, intrigued many of the finest mathematicians, 
but it long resisted all attempts to solve it. Finally, using sophisticated tech- 
niques of algebraic geometry developed at the end of the twentieth century, 
Andrew Wiles proved Fermat’s Last Theorem in 1995. 



Before its solution, Fermat’s Last Theorem was a challenge to mathemati- 
cians (as climbing Mount Everest was a challenge to mountaineers). There are 
no dramatic applications of the result, but it is yet another triumph of human in- 
tellect. What is true is that, over the course of 350 years, much of contemporary 
mathematics was invented and developed in trying to deal with it. The num- 
ber theory recorded in Euclid was shown to have similarities with the behavior 
of polynomials, and generalizations of prime numbers and unique factoriza- 
tion owe their initial study to attempts at proving Fermat’s Last Theorem. But 
these topics are also intimately related to what is actually taught in high school. 
Thus, abstract algebra is not merely beautiful and interesting, but it is also a 
valuable, perhaps essential, topic for understanding high school mathematics. 

Some Features of This Book 

We include sections in every chapter, called Connections , in which we explic- 
itly show how the material up to that point can help the reader understand and 
implement the mathematics that high school teachers use in their profession. 
This may include the many ways that results in abstract algebra connect with 
core high school ideas, such as solving equations or factoring. But it may also 
include mathematics for teachers themselves, that may or may not end up “on 
the blackboard;” things like the use of abstract algebra to make up good prob- 
lems, to understand the foundations of topics in the curriculum, and to place 
the topics in the larger landscape of mathematics as a scientific discipline. 

Many students studying abstract algebra have problems understanding 
proofs; even though they can follow each step of a proof, they wonder how 
anyone could have discovered its argument in the first place. To address such 
problems, we have tried to strike a balance between giving a logical develop- 
ment of results (so the reader can see how everything fits together in a coherent 
package) and discussing the messier kinds of thinking that lead to discovery 
and proofs. A nice aspect of this sort of presentation is that readers participate 
in doing mathematics as they learn it. 

One way we implement this balance is our use of several design features, 
such as the Connections sections described above. Here are some others. 

• Sidenotes provide advice, comments, and pointers to other parts of the text 
related to the topic at hand. What could be more fitting for a book related to 
Fermat’s Last Theorem than to have large margins? 

• Interspersed in the text are boxed “callouts,” such as How to Think About 
It, which suggest how ideas in the text may have been conceived in the 
first place, how we view the ideas, and what we guess underlies the formal 
exposition. Some other callouts are: 

Historical Note, which provides some historical background. It often helps 
to understand mathematical ideas if they are placed in historical con- 
text; besides, it’s interesting. The biographies are based on those in the 
MacTutor History of Mathematics Archive of the School of Mathemat- 
ics and Statistics, University of St. Andrews, Scotland. It can be found 
on the internet: its URL is 

www-history . mcs . st-andrews . ac . uk 
Etymology, which traces out the origin of some mathematical terms. We 
believe that knowing the etymology of terms often helps to understand 
the ideas they name. 




Preface xv 


Etymology. The word mathematics comes from classical Greek; it 
means “knowledge,” “something learned.” But in ancient Rome through 
the thirteenth century, it meant “astronomy” and “astrology.” From the 
Middle Ages, it acquired its present meaning. 

The word arithmetic comes from the Greek word meaning “the art of 
counting.” The word geometry, in classical Greek, meant “science of 
measuring;” it arose from an earlier term meaning “land survey.” 


It is a pleasure to acknowledge those who have contributed valuable com- 
ments, suggestions, ideas, and help. We thank Don Albers, Carol Baxter, Bruce 
Berndt, Peter Braunfeld, Keith Conrad, Victoria Corkery, Don DeLand, Ben 
Fischer, Andrew Granville, Heini Halberstam, Zaven Karian, Tsit-Yuen Lam, 
Paul Monsky, Beverly Ruedi, Glenn Stevens, and Stephen Ullom. 


Conrad’s website 

www . math . uconn . edu/ 
~kconrad/blurbs/ 

is full of beautiful ideas. 


A Note to Students 

The heart of a mathematics course lies in its problems. We have tried to or- 
chestrate them to help you build a solid understanding of the mathematics in 
the sections. Everything afterward will make much more sense if you work 
through as many exercises as you can, especially those that appear difficult. 
Quite often, you will learn something valuable from an exercise even if you 
don’t solve it completely. For example, a problem you can’t solve may show 
that you haven’t fully understood an idea you thought you knew; or it may 
force you to discover a fact that needs to be established to finish the solution. 
There are two special kinds of exercises. 

• Those labeled Preview may seem to have little to do with the section at hand; 
they are designed to foreshadow upcoming topics, often with numerical ex- 
periments. 

• Those labeled Take it Further develop interesting ideas that are connected 
to the main themes of the text, but are somewhat off the beaten path. They 
are not essential for understanding what comes later in the text. 

An exercise marked with an asterisk, such as 1.8*, means that it is either 
used in some proof or it is referred to elsewhere in the text. For ease of finding 
such exercises, all references to them have the form “Exercise 1 .8 on page 6” 
giving both its number and the number of the page on which it occurs. 


A Note to Instructors 

We recommend giving reading assignments to preview upcoming material. 
This contributes to balancing experience and formality as described above, and 
it saves time. Many important pages can be read and understood by students, 
and they should be discussed in class only if students ask questions about them. 

It is possible to use this book as a text for a three hour one-semester course, 
but we strongly recommend that it be taught four hours per week. 


— A1 Cuoco and Joe Rotman 
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Early Number Theory 


Algebra, geometry, and number theory have been used for millennia. Of course, 
numbers are involved in counting and measuring, enabling commerce and ar- 
chitecture. But reckoning was also involved in life and death matters such as 
astronomy, which was necessary for navigation on the high seas (naval com- 
merce flourished four thousand years ago) as well as to predict the seasons, 
to apprise farmers when to plant and when to harvest. Ancient texts that have 
survived from Babylon, China, Egypt, Greece, and India provide evidence for 
this. For example, the Nile River was the source of life in ancient Egypt, for 
its banks were the only arable land in the midst of desert. Mathematics was 
used by the priestly class to predict flooding as well as to calculate area (taxes 
were assessed according to the area of land, which changed after flood waters 
subsided). And their temples and pyramids are marvels of engineering. 

1.1 Ancient Mathematics 

The quadratic formula was an important mathematical tool, and so it was 
taught to younger generations training to be royal scribes. Here is a problem 
from an old Babylonian cuneiform text dating from about 1700 BCE. We quote 
from van der Waerden [35], p. 61 (but we write numbers in base 10 instead 
of in base 60, as did the Babylonians). We also use modern algebraic notation 
that dates from the fifteenth and sixteenth centuries (see Cajori [6]). 

I have subtracted the side of the square from the area , and it is 870. What 

is the side of my squarel 

The text rewrites the data as the quadratic equation x 2 — x = 870; it then 
gives a series of steps showing how to find the solution, illustrating that the 
Babylonians knew the quadratic formula. 

Historians say that teaching played an important role in ancient mathe- 
matics (see van der Waerden [35], pp. 32-33). To illustrate, the coefficients 
of the quadratic equation were chosen wisely: the discriminant b 2 — 4 ac = 
1 — 4(— 870) = 3481 = 59 2 is a perfect square. Were the discriminant not a 
perfect square, the problem would have been much harder, for finding square 
roots was not routine in those days. Thus, the quadratic in the text is well- 
chosen for teaching the quadratic formula; a good teaching prize would not be 
awarded for x 2 — 47 x = 210. 

The Babylonians were not afraid of cubics. Another of their problems from 
about the same time is 


The number 59 may have 
been chosen because 
the Babylonians wrote 
numbers in base 60, and 
59 = 60 - 1. 


1 
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Solve I2x 3 = 3630, 

and the answer was given. The solution was, most likely, obtained by using 
tables of approximations of cube roots. 

A standard proof of the quadratic formula is by “completing the square.” 
This phrase can be taken literally. Given a quadratic x 2 + bx = c with b and c 
positive, we can view x 2 + bx as the shaded area in Figure 1.1. Complete the 


5 * 


\ b 


Figure 1 .1 . Completing the Square. 


figure to a square by attaching the corner square having area ^b x ^b = jb 2 \ 
the new square has area 

c + \b 2 = x 2 + bx + \b 2 = (x + jb) 2 . 


Thus, x + \b = yj c + \b 2 , which simplifies to the usual formula giving 

the roots of x 2 + bx — c. The algebraic proof of the validity of the quadratic 
formula works without assuming that b and c are positive, but the idea of the 
proof is geometric. 


In [35], pp. 26-35, van 
der Waerden considers 
the origin of proofs in 
mathematics, suggesting 
that they arose in Europe 
and Asia in Neolithic 
(late Stone Age) times, 
4500 BCE-2000 BCE. 


Exercise 1 .4 on page 5 
asks you to show that the 
rhombus in Figure 1.2 
with sides of length c is a 
square. 


b a b 


a 2 


/ 

b 2 



Figure 1.2. Pythagorean Theorem. 

The Babylonians were aware of the Pythagorean Theorem. Although they 
believed it, there is no evidence that the Babylonians had proved the Pythag- 
orean Theorem; indeed, no evidence exists that they even saw a need for a 
proof. Tradition attributes the first proof of this theorem to Pythagoras, who 
lived around 500 BCE, but no primary documents extant support this. An ele- 
gant proof of the Pythagorean Theorem is given on page 354 of Heath’s 1926 
translation [16] of Euclid’s The Elements', the theorem follows from equality 
of the areas of the two squares in Figure 1.2. 
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Here is an ancient application of the Pythagorean Theorem. Aristarchus 
(ca. 310 BCE-250 BCE) saw that the Moon and the Sun appear to be about 
the same size, and he wondered how far away they are. His idea was that at 
the time of the half- moon, the Earth E, Moon M , and Sun S form a right 
triangle with right angle AM (that is, looking up at the Moon, the line of sight 
seems to be perpendicular to the Sun’s rays). The Pythagorean Theorem gives 



\SE\ 2 = \SM\ 2 + \ME\ 2 . Thus, the Earth is farther from the Sun than it is 
from the Moon. Indeed, at sunset, a = AE seems to be very close to 90°: if we 
are looking at the Moon and we wish to watch the Sun dip below the horizon, 
we must turn our head all the way to the left. Aristarchus knew trigonometry; 
he reckoned that cos a was small, and he concluded that the Sun is very much 
further from the Earth than is the Moon. 


Example 1.1. Next, we present a geometric problem from a Chinese collec- 
tion of mathematical problems. Nine Chapters on the Mathematical Art , writ- 
ten during the Han Dynasty about two thousand years ago. Variations of this 
problem still occur in present day calculus books! 


There is a door whose height and width are unknown , and a pole whose 
length p is also unknown. Carried horizontally, the pole does not fit by 4 
clTih ; vertically, it does not fit by 2 ch’ih; slantwise, it fits exactly. What 
are the height, width, and diagonal of the door? 


There are similar problems 
from the Babylonians and 
other ancient cultures. 



P-4 


P~ 


2 


Figure 1 .4. Door Problem. 

The data give a right triangle with sides p — 4, p — 2, and p, and the Py- 
thagorean Theorem gives the equation {p — 4) 2 + (p — 2) 2 = p 2 , which 
simplifies to p 2 — \2p + 20 = 0. The discriminant b 2 —4ac is 144 — 80 = 64, 
a perfect square, so that p = 10 and the door has height 8 and width 6 (the 
other root of the quadratic is p = 2, which does not fit the physical data). 
The sides of the right triangle are 6, 8, 10, and it is similar to the triangle with 
sides 3,4,5. Again, the numbers have been chosen wisely. The idea is to teach 
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The word hypotenuse 
comes from the Greek verb 
meaning to stretch. 


students how to use the Pythagorean Theorem and the quadratic formula. As 
we have already remarked, computing square roots was then quite difficult, so 
that the same problem for a pole of length p = 12 would not have been very 
bright because there is no right triangle with sides of integral length that has 
hypotenuse 12. ▲ 

Are there right triangles whose three sides have integral length that are not 
similar to the 3, 4, 5 triangle? You are probably familiar with the 5, 12, 13 tri- 
angle. Let’s use A (a.b.c) (lower case letters) to denote the triangle whose 
sides have length a, b , and c; if A (a. b. c ) is a right triangle, then c denotes 
the length of its hypotenuse, while a and b are its legs. Thus, the right trian- 
gle with side-lengths 5, 12, 13 is denoted by A(5, 12, 13). (We use the usual 
notation, A ABC , to denote a triangle whose vertices are A , B , C.) 

Definition. A triple {a, b. c ) of positive integers with a 2 + b 2 = c 2 is called 
a Pythagorean triple. 

If (a, b , c) is a Pythagorean triple, then the triangles A (a. b, c) and A(b. a. c ) 
are the same. Thus, we declare that the Pythagorean triples (<r/, b. c) and ( b , a , c) 
are the same. 


Historical Note. Pythagorean triples are the good choices for problems teach- 
ing the Pythagorean Theorem. There are many of them: Figure 1.5 shows a 
Babylonian cuneiform tablet dating from the dynasty of Hammurabi, about 
1800 BCE, whose museum name is Plimpton 322, which displays fifteen 
Pythagorean triples (translated into our number system). 


b 

a 

c 

120 

119 

169 

3456 

3367 

4825 

4800 

4601 

6649 

13500 

12709 

18541 

72 

65 

97 

360 

319 

481 

2700 

2291 

3541 

960 

799 

1249 

600 

481 

769 

6480 

4961 

8161 

60 

45 

75 

2400 

1679 

2929 

240 

161 

289 

2700 

1771 

3229 

90 

56 

106 


Figure 1 .5. Plimpton 322. 
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It is plain that the Babylonians had a way to generate large Pythagorean 
triples. Here is one technique they might have used. Write 

a 2 = c 2 -b 2 = (c + b){c — b). 

If there are integers m and n with 

c + b = m 2 
c — b = n 2 , 

then 

a = tJJc + b)(c — b) = mn. (1.1) 

We can also solve for b and c : 

b = j(m 2 — n 2 ) (1.2) 

c = j(m 2 + n 2 ). (1.3) 

Summarizing, here is what we call the Babylonian method. Choose odd num- 
bers m and n (forcing m 2 + n 2 and m 2 — n 2 to be even, so that b and c are 
integers), and define a, b, and c by Eqs. (1.1), (1.2), and (1.3). For example, if 
m = 7 and n = 5, we obtain 35, 12, 37. If we choose m = 179 and n = 71, 
we obtain 13500, 12709, 18541, the largest triple on Plimpton 322. 

The Babylonian method does not give all Pythagorean triples. For example, 
(6, 8, 10) is a Pythagorean triple, but there are no odd numbers m > n with 

6 = mn or 8 = mn. Of course, (6, 8, 10) is not signifcantly different from 

(3, 4, 5), which arises from 3 > 1. In the next section, we will show, follow- 
ing Diophantus, ca. 250 CE, how to find all Pythagorean triples. But now we 
should recognize that practical problems involving applications of pure math- 
ematics (e.g., surveying) led to efforts to teach this mathematics effectively, 
which led to more pure mathematics (Pythagorean triples) that seems at first to 
have no application outside of teaching. The remarkable, empirical, fact is that 
pure mathematics yields new and valuable applications. For example, we shall 
see in the next section that classifying Pythagorean triples leads to simplifying 
the verification of some trigonometric identities as well as the solution of cer- 
tain integration problems (for example, we will see a natural way to integrate 
sec x ). 

Exercises 

1.1 Prove the quadratic formula for the roots of a x 2 + bx + c = 0 whose coefficients 

a, b, and c may not be positive. 

1.2 Give a geometric proof that (a + b ) 2 = a 2 + 2 ab + b 2 for a, b positive. 

1.3 * Let f(x) = ax 2 + bx + c be a quadratic whose coefficients a, b, c are rational. 

Prove that if / (x) has one rational root, then its other root is also rational. 

1.4 * 

(i) Prove that the rhombus with side lengths c in the left square of Figure 1.2 is 
a square. 

(ii) Prove the Pythagorean Theorem in a way suggested by Figure 1.2. 

(iii) Give a proof of the Pythagorean Theorem different front the one suggested 
by Figure 1.2. 


After all, what practi- 
cal application does 
the Pythagorean triple 
(13500, 12709, 18541) 
have? 


The book by Loomis [20] 
contains 370 different 
proofs of the Pythagorean 
Theorem, by the author’s 
count. 
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1.5 Here is another problem from Nine Chapters on the Mathematical Art. A pond is 
10 ch’ih square. A reed grows at its center and extends 1 ch’ih out of the water. 
If the reed is pulled to the side of the pond, it reaches the side precisely. What are 
the depth of the water and the length of the reed? 

Answer. Depth = 12 ch’ih and length = 13 ch’ih. 

1.6 * 

(i) Establish the algebraic identity 

(ii) Use (i) to establish the Arithmetic-Geometric Mean Inequality, if a and h 
are positive reals, then 

\fab < j(a + h). 

When is there equality? 

(iii) Show how to dissect an a x b rectangle so that it fits inside a square with 
side-length (a + b)/2. How much is “left over?” 

Hint. Try it with numbers. Cut an 8 x 14 rectangle to fit inside an 11 x 11 
square. 

(iv) Show that a rectangle of maximum area with fixed perimeter is a square. 

(v) The hyperbolic cosine is defined by 

coshx = j(e x + e~ x ). 

Prove that coshx > 1 for all real numbers x, while cosh.v = 1 if and only if 
x = 0. 

(vi) Use Figure 1.6 to give another proof of the Arithmetic-Geometric Mean In- 
equality. 



1.7 * Prove that there is no Pythagorean triple (a,b, c) with c = 12. 

1.8 * Let (a, b, c) be a Pythagorean triple. 

(i) Prove that the legs a and b cannot both be odd. 

(ii) Show that the area of A (a,b, c) is an integer. 
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1.9 * Show that 5 is not the area of a triangle whose side-lengths form a Pythagorean 
triple. 

1.10 * Let (a,b,c) be a Pythagorean triple. If m is a positive integer, prove that 
( ma,mb , me) is also a Pythagorean triple. 

1.11 (Converse of Pythagorean Theorem). * Let A = A(a,b, c) be a triangle with 
sides of lengths a, b , c (positive real numbers, not necessarily integers). Prove that 
if a 2 + b 2 = c 2 , then A is a right triangle. 

Hint. Construct a right triangle A' with legs of lengths a,b, and prove that A' is 
congruent to A by side-side-side. 

1.12 * Prove that every Pythagorean triple (a,b,c) arises from a right triangle A (a,b,c) 
having sides of lengths a, b, c. 

1.13 If P = (a, b, c) is a Pythagorean triple, define r(P) = c/a. If we label the Py- 
thagorean triples on Plimpton 322 as Pj , . . . , P 15 , show that r(P , ) is decreasing: 
r(Pj) > r(Pj + 1 ) for all / < 14. 

1.14 * If (a, b, c) is a Pythagorean triple, show that ( a/c,b /c ) is a point on the graph 
of x 2 + y 2 = 1. What is the graph of x 2 + v 2 = 1? 

1.15 Preview. Let L be the line through ( — 1,0) with slope t. 

(i) If t = 5 , find all the points where L intersects the graph of x 2 + y 2 = 1 . 

Answer. (|, ^). 

(ii) Iff = ^ find all the points where L intersects the graph of x 2 + y 2 = 1. 
Answer. (^, -j-|). 

(iii) Pick a rational number t , not ^ or ^ , and find all the points where L intersects 
the graph of A' 2 + y 2 = 1 . 

(iv) Suppose l is a line that contains (—1,0) with slope r . If r is a rational number, 
show that i intersects the graph of x 2 + y 2 = 1 in two points, each of which 
has rational number coordinates. 

1.16 Preview. A Gaussian integer is a complex number a + hi where both a and b 
are integers. Pick six Gaussian integers r + si with /' > s > 0 and square them. 
State something interesting that you see in your results. 

1.17 Preview. Consider a complex number z = q + ip, where q > p are positive 
integers. Prove that 

(q 2 - p 2 ,2qp,q 2 + p 2 ) 

is a Pythagorean triple by showing that |z 2 | = |z| 2 . 

1.18 Preview. Show, for all real numbers m and n , that 

r . 1 

Ijim + n) + m —n)i =■ mn + —n )i. 


1.2 Diophantus 

We are going to classify Pythagorean triples using a geometric method of Dio- 
phantus that describes all Pythagorean triples. 


Historical Note. We know very little about the life of Diophantus. He was 
a mathematician who lived in Alexandria, Egypt, but his precise dates are 


If z is a complex number, 
say z = a + bi, then w e 
define |z| = \Ja 2 + b 2 . 
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unknown; most historians believe he lived around 250 CE. His extant work 
shows systematic algebraic procedures and notation, but his leaps of intuition 
strongly suggest that he was thinking geometrically; indeed, Newton called 
Diophantus’s discussion of Pythagorean triples the chord method (see Fig- 
ure 1.7). Thus, geometry (the Pythagorean Theorem) and applied problems 
(teaching) suggested an algebraic problem (find all Pythagorean triples), and 
we now return to geometry to solve it. Here is evidence that the distinction 
between algebra and geometry is an artificial one; both are parts of the same 
subject. 


Geometry and Pythagorean Triples 

Before we get into the technicalities of Diophantus’s classification of Pythag- 
orean triples, let’s note that geometry is lurking nearby. Exercise 1.14 above 
makes a natural observation: if ( a , b, c) is a Pythagorean triple, then 



a point on the unit circle , the circle having radius 1, center the origin, and 
equation x 2 + y 2 = 1. Dividing through by c 2 is a good idea. For example, 
(6, 8, 10) is a “duplicate” of (3, 4, 5), and both of these Pythagorean triples 
determine the same point, (3/5, 4/5), on the unit circle. 

Here is the main idea of Diophantus. Even though those points arising from 
Pythagorean triples are special (for example, they lie in the first quadrant and 
both their coordinates are rational numbers), let’s parametrize all the points P 
on the unit circle. Choose a point on the unit circle “far away” from the first 
quadrant; the simplest is (—1, 0), and let i = l(P) be the line joining it to P . 
We shall see that the slopes of such lines parametrize all the points on the 
unit circle. In more detail, any line l through (—1,0) (other than the tangent) 



intersects the unit circle in a unique second point, P = (x, y); let t be the 
slope of l. As t varies through all real numbers, — oo < t < oo, the intersection 
points P of l and the unit circle trace out the entire circle (except for (—1, 0)). 

Proposition 1.2. The points P on the unit circle ( other than (—1,0)) are 
parametrized as 


P = ( lz£l 

V 1 T f 2 l+t 2 J 


where — oo < t < oo. 
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Proof. The line through points (a, b) and (c, d) has equation y—b = t(x—a), 
where t = (d — b)/(c — a), so the line l through (—1, 0) and a point P = 
(x, y ) on the unit circle has an equation of the form y = t(x + 1), so that 
x = (y —t)/t. Thus, (x, y) is a solution of the system 

y = t(x + 1) 
x 2 + y 2 = 1. 

An obvious solution of this system is (—1, 0), because this point lies on both 
the line and the circle. Let’s find x and y in terms of t . If the slope t = 0, 
then l is the x-axis and the other solution is (1, 0). To find the solutions when 
t ^ 0, eliminate x: the equations 

y — t 22 

: = x and x + y = 1 

t 

give 



Expanding and simplifying, we obtain 

y [(1 + t 2 )y — 2f] = 0. 

We knew at the outset that y = 0 makes this true. If y 0, then canceling 
gives 


y = 


2 1 

ITT 1 ’ 


and solving for x gives 

x = y~ r = T+^~ t = 1 - f u 

t t 1 + 1 2 ' 

In Exercise 1.12 on page 7, we saw that every Pythagorean triple (a. b, c ) 
arises from a right triangle A (a, b, c) having sides of integral lengths a, b, c. 
Conversely, the Pythagorean Theorem says that every right triangle A (a, b, c ) 
whose sides have integral length gives the Pythagorean triple (a, b. c ). Thus, 
Pythagorean triples and certain right triangles are merely two ways to view the 
same idea, one algebraic, one geometric. At any given time, we will adopt that 
viewpoint which is most convenient. 

We have already run across distinct Pythagorean triples that are essentially 
the same; Exercise 1.10 on page 7 shows that if (a.b.c) is a Pythagorean 
triple, then so is ( ma , mb. me), where in is a positive integer. The right trian- 
gles A (a, b , c) and A (ma, mb, me) determined by these Pythagorean triples 
are similar, for their sides are proportional. More generally, the Pythagorean 
triples (6, 8, 10) and (9, 12, 15) are not really different, for each arises from 
(3, 4, 5); however, neither (6, 8, 10) nor (9, 12, 15) is obtained from the other 
by multiplying its terms by some integer m. 

Definition. Two Pythagorean triples {a, b, c) and ( u , v, z) are similar if their 
right triangles A (a, b, c) and A (w, v, z) are similar triangles. 
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The method of Diophantus will give a formula for certain special Pythago- 
rean triples, and it will then show that every Pythagorean triple is similar to a 
special one. 


Definition. A point (x, y) in the plane is a rational point if both x and y are 
rational numbers. 

A Pythagorean point is a rational point in the first quadrant, lying on the 
unit circle, and above the diagonal line with equation y = x. 

Remember that we regard Pythagorean triples ( a , b. c) and ( b , a , c) as the 
same. Recall some analytic geometry: if a ^ b, P = {a, b), and Q = (b. a), 
then the diagonal is the perpendicular bisector of the segment PQ. (The line 
through P and Q has equation y = —x + a + b; it is perpendicular to the 
diagonal for the product of their slopes is — 1 ; the line intersects the diagonal 
in the point {^-^-, which is equidistant from P and Q. If a < b, then P 
is above the diagonal and Q is below.) 

Proposition 1.3. A triple (a,b,c) of integers is a Pythagorean triple if and 
only if {a/c, b/c) is a Pythagorean point. 


Proof. Let (a , b , c ) be a Pythagorean triple . Dividing both sides of the defining 
equation a 2 + b 2 = c 2 by c 2 gives 

{a/c) 2 + {b/c) 2 = 1, 

so that the triple gives an ordered pair of positive rational numbers (x, y) = 
(a/c,b/c) with x 2 + y 2 = 1. Thus, the rational point P = (x, y) lies in the 
first quadrant. As both (a, b,c) and (b, a, c) are the same Pythagorean triple, 
we may assume that 

x = a/c < b/c = y, 


so that (x, y) lies above the diagonal line with equation y = x. Hence, (x, _y) 
is a Pythagorean point. 

Conversely, let’s now see that a Pythagorean point (x, y) gives rise to a 
Pythagorean triple. Write the rational numbers x < y with the same denomi- 
nator, say, x = a/c and y = b/c, where a,b, and c are positive integers and 
a < b < c. Now 


1 = x 2 + y 2 


+ 


so that a 2 + b 2 = c 2 and hence (a. b , c) is a Pythagorean triple. ■ 


In summary, the problem of finding all Pythagorean triples corresponds to 
the problem of finding all Pythagorean points. This is exactly what the geo- 
metric idea of Diophantus does. In fact, a Pythagorean point (x, y) gives rise 
to infinitely many Pythagorean triples. Write the coordinates with another de- 
nominator, say x = u/z and y = v/z. The calculation at the end of the proof 
of Proposition 1.3 shows that (u, v, z) is another Pythagorean triple arising 
from (x, y). 


Etymology. Here are sources of some common words of mathematics. 

• Proposition. From Latin, meaning a statement or something pictured in the 
mind. 
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• Theorem. From the Greek word meaning “spectacle” or “contemplate.” Re- 
lated words are “theory” and “theater.” Theorems are important propositions. 

• Corollary. From the Latin word meaning “flower.” In ancient Rome, it meant 
a “gratuity;” flowers were left as tips. In mathematics, corollaries follow eas- 
ily from theorems; they are gifts bequeathed to us. 

• Lemma. From Greek; it meant something taken for granted. In mathematics 
nowadays, it is usually a technical result, a minor theorem, which can be 
used in the course of proving a more important theorem. 

• Proof. From Medieval French, meaning an argument from evidence estab- 
lishing the truth. The adage, “The exception proves the rule,” uses the word 
in the sense of testing: it originally meant a kind of indirect proof. We test 
whether a rule is true by checking whether an exception to it leads to a con- 
tradiction. Nowadays, this adage seems to have lost its meaning. 


The Method of Diophantus 

Proposition 1.2 parametrizes all the points P on the unit circle other than 
(—1,0). We are now going to see which values of t produce Pythagorean 
points: rational points on the unit circle lying in the first quadrant above the 
diagonal line with equation y = x. 

Theorem 1.4. Let P = (x, y) ^ (—1, 0) be a point on the unit circle, and let 
t be the slope of the line l joining (—1, 0) and P . 

(i) The slope t is a rational number if and only if P is a rational point. 

(ii) The point P is a Pythagorean point if and only if t is a rational number 
satisfying \[7. — 1 < t < 1. 


Proof, (i) The parametrization P = (x, y) gives a pair of equations: 

1 - 1 2 . 2 1 


1 + t 2 


and y 


1 + f 2 


Clearly, if t is rational, then both x and y are rational. Conversely, if 
P = (x, y) is a rational point, then the slope t of l is t = , 

and so t is a rational number. 

(ii) Pythagorean points correspond to rational points on the unit circle that lie 
in the first quadrant above the line y = x. Points on the circle lying in the 
first quadrant arise from lines having slope t with 0 < t < 1 . The point 
in the first quadrant that is the intersection of the unit circle and the line 
y = x is , ^y), and the slope of the line joining (—1, 0) to (^, 
is 




V2 


2 



sfl- 1 « .414. 


Therefore, Pythagorean points correspond to the lines t through (—1,0) 
having rational slope t satisfying \p2 — 1 < t < 1 . ■ 


The slope of the line joining 
(-1,0) to (0, 1) is 1. 
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Let’s look at this more closely. If t = p / q is a rational number between 
\/2 — 1 and 1, then the Pythagorean point it gives can be expressed in terms of 
p and q : 


n -t 2 2 1 

\ l + r 2 l + 1 


/ 

V 


■-(f ) 2 2 (f) 

1+ (ff l+ (f) 

q 2 ~ P 2 2 qp \ 

q 1 + p 2 ’ q 1 + p 1 ) 


\ 

/ 


(1.4) 


Theorem 1.5 (Diophantus). Every Pythagorean triple (a, h. c) is similar to a 
Pythagorean triple of the form 

{2qp, q 2 — p 2 , q 2 + p 2 ), 

where p and q are positive integers with q > p > s/2. — 1. 


Proof Since (a, b, c) is a Pythagorean triple, P = ( a/c , b/c) is a Pythago- 
rean point. By Eq. ( 1 .4), 

/a b\ / 1 — t 2 2 1 \ ( q 2 — P 2 2qp \ 

Vc’c/ V 1 + t 2 ’ 1 + t 2 ) \q 2 + p 2 ’ q 2 + p 2 )' 

It follows that A (a,b,c) is similar to A (2 pq,q 2 — p 2 ,q 2 + p 2 ), because their 
sides are proportional. Therefore, the Pythagorean triple (a, b. c ) is similar to 
(2 qp, q 2 — p 2 , q 2 + p 2 ), as claimed. ■ 


How to Think About It. The strategy of Diophantus is quite elegant. The 
problem of determining all Pythagorean triples is reduced from finding three 
unknowns, a,b, and c, to two unknowns, x = a/c and y = b/c, to one 
unknown, t = p/q. In effect, all Pythagorean triples are parametrized by t; 
that is, as t varies over all rational numbers between \/2— 1 and 1, the formulas 
involving t vary over all Pythagorean points and hence over all Pythagorean 
triples. 


We are tacitly using a 
technique of proof called 
Infinite Descent. If, for a 
given positive integer n 
with certain properties, 
there always exists a 
strictly smaller positive 
integer n\ having the 
same properties, then 
there are infinitely many 
such integers. But this is 
impossible; there are only 
finitely many integers with 
n > n\ > n 2 > ■■■ > 0. 


We can now show that the Babylonians had, in fact, found all Pythagorean 
triples. 

Corollary 1.6. Every Pythagorean triple is similar to one arising from the 
Babylonian method. 

Proof By Theorem 1.5, every Pythagorean triple is similar to one of the form 
(2qp, q 2 — p 2 , q 2 + p 2 ), where q > p are positive integers. If both q and p are 
even, then we can replace q > p by \q > ^p, obtaining a Pythagorean triple 
(^2 qp, \ {q 2 — p 2 ), \{q 2 + p 2 )) similar to the original one. If both parameters 
of the new triple are still even, replace j q > 2 p by : q > \p. Eventually, we 
arrive at a Pythagorean triple (2 rs, r 2 — s 2 . r 2 + s 2 ), similar to the original 
triple, that arises from parameters r > s, at least one of which is odd. 
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There are two possibilities. If r and s have different parity, define m = r + s 
and n = r — s. Both m and n are odd, and the Pythagorean triple given by the 
Babylonian method from m > n is 

B = (mn, j(m 2 — n 2 ), j(m 2 + n 2 )). 


Substitute: 


mn 


= (r+s)(r—s) = r 2 —s 2 , \{m 2 —n 2 ) = 2 rs, and i (m 2 —n 2 ) = r 2 + s 2 . 


Thus, the Pythagorean triple B is similar to (2 rs, r 2 — s 2 , r 2 + s 2 ). 

If both r and s are odd, then the Pythagorean triple given by the Babylo- 
nian method from r > s is ( rs , ^(r 2 — s 2 ), \{r 2 + s 2 )) which is similar to 
(2 rs,r 2 - s 2 ,r 2 + s 2 ). ■ 


Not every Pythagorean triple (a, b , c) is equal to (2qp, q 2 — p 2 , q 2 + p 2 ) 
for some q > p, nor does the theorem say that it is; the theorem asserts only 
that (r/, b, c ) is similar to a Pythagorean triple arising from the formula. For 
example, let us show that (9, 12, 15) is not of this form. Since the leg 9 is odd, 
the even leg 12 must be 2 qp, so that qp = 6, and the only possible parameters 
are 3 > 2 or 6 > 1. But 3 > 2 gives (5, 12, 13) and 6 > 1 gives (12, 35, 37), 
neither of which is similar to (9, 12, 15). However, (9, 12, 15) is similar to 
(3, 4, 5), and (3, 4, 5) arises from 2 > 1. 

A Pythagorean triple (a. b. c) is primitive if there is no integer d > 1 that 
is a divisor of a, b , and c. Thus, (3, 4, 5) is primitive but (9, 12, 15) is not. 
In Theorem 1.25, we’ll give a rigorous proof that every Pythagorean triple is 
similar to exactly one primitive Pythagorean triple. 


Exercises 

1.19 Find q and p in Theorem 1.5 for each of the following Pythagorean triples. 

(i) (7,24,25). 

Answer, q = 5 and p = 3. 

(ii) (129396,261547.291805). 

Answer, q = 526 and p = 123. 

1.20 * Show that every Pythagorean triple (x, y , z) with x, y, z having no common 
factor d > 1 is of the form 


(r 2 - s 2 , 2 rs, r 2 + s 2 ) 

for positive integers r > s having no common factor > 1 ; that is, 
x = r 2 — s 2 , y = 2 rs, z = r 2 + s 2 . 

1.21 A line in the plane with equation y = mx + c is called a rational line if m and c 
are rational numbers. If P and 0 are distinct rational points, prove that the line 
joining them is a rational line. 

1.22 A lattice point is a point in the plane whose coordinates are integers. Let P = 
( x , y) be a Pythagorean point and l the line through P and the origin. Prove that 
if Q = (a,b) is a lattice point on l and c is the distance front Q to the origin, 
then (a,b, c ) is a Pythagorean triple. 
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1.23 * Let P = (.vo, Vo) be a Pythagorean point and L the line joining P and the origin 
(so the equation of L is y = mx, where m = Vo/xq). Show that if ( a/c.b/c ) is 
a rational point on L, then (a,b, c) is a Pythagorean triple. 

1.24 Does every rational point in the plane correspond to a Pythagorean point? If so, 
prove it. If not, characterize the ones that do. 

Answer. No. For example, ( 7 , ^) does not correspond. 

1.25 * Prove the identity (x 2 + y 2 )“ = ( x 2 — y 2 )~ + (2 xy) 2 . 

1.26 * 

(i) Show that the same number can occur as a leg in two nonsimilar Pythagorean 
triangles. 

(ii) Prove that the area of A (a,b, c ), a right triangle with integer side lengths, is 
an integer. 

(iii) A Heron triangle is a triangle with integer side lengths and area. Find a Heron 
triangle that is not a right triangle. 

Hint. Use parts (i) and (ii). 

1.27 Show that every integer n > 3 occurs as a leg of some Pythagorean triple. 

Hint. The cases n even and n odd should be done separately. 

1.28 Distinct Pythagorean triples can have the same hypotenuse: both (33, 56, 65) and 
(16,63,65) are Pythagorean triples. Find another pair of distinct Pythagorean 
triples having the same hypotenuse. 

1.29 * If (cos#, sin#) is a rational point, prove that both cos(# + 30°) and sin(# + 30°) 
are irrational. 


Fermat’s Last Theorem 

About fourteen centuries after Diophantus, Fermat (1601-1665) proved that 
there are no positive integers a,b,c with a 4 + b 4 = c 4 . He was studying 
his copy of Diophantus’s Arithmetica , published in 1621, and he wrote in its 
margin, 

... it is impossible for a cube to be written as a sum of two cubes or a 
fourth power to be written as a sum of two fourth powers or. in general, 
for any number which is a power greater than the second to be written 
as a sum of two like powers. I have discovered a truly marvelous demon- 
stration of this proposition which this margin is too narrow to contain. 


Fermat was not the first 
mathematician to write a 
marginal note in a copy 
of Diophantus. Next to 
the same problem, the 
Byzantine mathematician 
Maximus Planudes wrote, 
Thy soul. Diophantus. be 
with Satan because of the 
difficulty of your theorems. 


Fermat never returned to this problem (at least, not publicly) except for his 
proof of the case n = 4, which we give below. The statement: If n > 2, 
there are no positive integers a.b.c with a n + b n = c n , was called Fer- 
mat’s Last Theorem , perhaps in jest. The original text in which Fermat wrote 
his famous marginal note is lost today. Fermat’s son edited the next edition 
of Diophantus, published in 1670; this version contains Fermat’s annotations, 
including his famous “Last Theorem;” it contained other unproved assertions 
as well, most true, some not. By the early 1800s, only Fermat’s Last Theorem 
remained undecided. It became a famous problem, resisting the attempts of 
mathematicians of the highest order for 350 years, until it was finally proved, 
in 1995, by Wiles. His proof is very sophisticated, and most mathematicians 
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believe that Fermat did not have a correct proof. The quest for a proof of Fer- 
mat’s Last Theorem generated much beautiful mathematics. In particular, it led 
to an understanding of complex numbers, factorization, and polynomials. We’ll 
see, in the Epilog, that extending the method of Diophantus from quadratics to 
cubics involves elliptic curves, the study of which is the setting for Wiles’ 
proof of Fermat’s Last Theorem. 

Fermat proved the next theorem (which implies the case n = A of Fermat’s 
Last Theorem) because he was interested in the geometric problem of deter- 
mining which right triangles having all sides of rational length have integer 
area (we’ll soon discuss this problem in more detail). 

Theorem 1.7 (Fermat). There is no triple (x, y , z) of positive integers with 

This proof is not difficult, 
but it uses several elemen- 
tary divisibility results we’ll 
prove later. Since we feel 
that this is the appropriate 
place for this theorem, we’ll 
just refer to the needed 
things. 


(x 2 ) 2 + (y 2 ) 2 = z 2 

so that ( x 1 , y 2 , z) form a Pythagorean triple. 

We also observe that x 2 and y 2 can’t both be odd; if x 2 = 2k + 1 and 
y 2 = 2j + 1, then 

(2k + l) 2 + (2j + l) 2 = z 2 . 

Expanding and collecting terms gives z 2 = Ah + 2 for some integer h. But you 
can check that the square of any integer is either of the form Ah or Ah + 1 . 

We can now assume that (x 2 , y 2 , z) is a Pythagorean triple in which x and 
y are relatively prime, x is odd, and y is even. By Exercise 1.20 on page 13, 
there are relatively prime integers r and s with r > s > 0 such that 

a = r — s , y = 2 rs, and z = r + s . 

The first equation says that x 2 + s 2 = r 2 ; that is, (x, s, r) is another Pythago- 
rean triple with x odd. Moreover, x and s have no common factor (why?), so 
that Exercise 1 .20 gives relatively prime integers a and b such that 

x = a 2 — b 2 , .5 = 2 ab, and r = a 2 + b 2 . 

Now, 

y 2 = 2 rs = 2 (a 2 + b 2 )(2ab) = A ab(a 2 + b 2 ). 

Since y is even, we have an equation in integers: 

(!) =ab(a 2 + b 2 ). 


x 4 +v 4 = z 2 . (1.5) 

Proof. The proof will be by infinite descent (Fermat invented infinite descent 
for this very problem). Given a triple of positive integers (x, y, z) satisfying 
Eq. (1.5), we’ll show there is another triple (u,v,w) of the same sort with 
w < z, and so repeating this process leads to a contradiction. 

Let’s say that integers x and y are relatively prime if there is no integer 
d > 1 dividing both of them; that is, it’s not true that x = da and y = db. 

We can assume that x and y are relatively prime, for otherwise a common 
factor of x and y would also be a factor of z, and we could divide it out. It 
follows (and we’ll prove it in the next chapter) that x 2 and y 2 are also relatively 
prime. And note that x 4 + V 4 = z 2 implies that 


( 1 . 6 ) 
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As a and b are relatively prime (no common factor cl > 1), each pair from 
the three factors on the right-hand side of Eq. ( 1 .6) is relatively prime. Since 
the left-hand side (y/ 2) 2 is a square, each factor on the right is a square (Ex- 
ercise 2.12 on page 59). In other words, there are integers u, v, and w such 
that 


a = u 2 , b = v 2 , and a 2 + b 2 = w 2 . 

And, since a and b are relatively prime, so, too, are u and v relatively prime. 
Hence, we have 

4 4 2 

U + V = w . 

This is our “smaller” solution to Eq. (1.5), for 

0 < w < w 2 = a 2 + b 2 = r < r 2 < r 2 + s 2 = z. 

We can now repeat this process on (: u , v, w). By infinite descent, there is no 
solution to Eq. (1.5). ■ 

Corollary 1.8 (Fermat’s Last Theorem for Exponent 4). There are no pos- 
itive integers x, y, z with 

x 4 +y 4 = z 4 . 

Proof. If such a triple existed, we’d have 

JC*+/ = (z 2 ) 2 , 

and that’s impossible, by Theorem 1.7. ■ 

Call an integer n > 2 good if there are no positive integers a,b,c with 
a n + b n = c n . If n is good, then so is any multiple nk of it. Otherwise, there 
are positive integers r, .s', t with r nk + s nk = f , and this gives the contradic- 
tion a" + b n = c n , where a = r k , b = s k , and c = t k . For example. Corol- 
lary 1 .8 shows that that any positive integer of the form 4 A: is good. Since every 
n > 2 is a product of primes, it follows that Fermat’s Last Theorem would be 
true if every odd prime is good. 

Connections: Congruent Numbers 

Fermat’s motivation for Theorem 1.7 came, not from a desire to prove there 
are no non-trivial integer solutions to x 4 + y 4 = z 4 , but from a problem 
in the intersection of arithmetic and geometry. In more detail, suppose that 
A = A (a. b. c) is the right triangle arising from a Pythagorean triple (a, b. c). 
Since A is a right triangle, the leg a is an altitude and the area of A (a. b. c) is 
jab; since (a, b, c) is a Pythagorean triple, the area is an integer (Exercise 1.8 
on page 6). Tipping this statement on its head, we ask which integers are areas 
of right triangles having integer side-lengths. Certainly 6 is, because it’s the 
area of A(3, 4, 5). But 5 is not the area of such a triangle (Exercise 1.9 on 
page 7). 

However, we claim that 5 is the area of a right triangle whose side-lengths 
are rational numbers. Consider the Pythagorean triple (9, 40, 41); its right tri- 
angle A = A (9, 40, 41) has area ±(9 • 40) = 180. Now 180 = 36-5. Scaling 
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the side-lengths of A by ^ scales the area by so that A (|, has area 

180/36 = 5. 

So, the question arises: “Is every integer the area of a right triangle with 
rational side-lengths?” Fermat showed that 1 and 2 are not, and his proof for 2 
involved Eq. (1.5). 

Theorem 1.9. There is no right triangle with rational side-lengths and area 2. 

Proof. Suppose, on the contrary, that the rational numbers r,s,t are the lengths 
of the sides of a right triangle with area 2. Then we have two equations: 

r 2 + s 2 = t 2 

±rs = 2. 

Multiply the first equation by r 2 to obtain 

r 4 + ( rs f = (rt) 2 , 

so that (since rs = 4), 

r 4 + 2 4 = (rtf. 


Write the rational numbers r and t as fractions with the same denominator: 
r = a/c and t = h /c. When we clear denominators, we get a 4 + z 4 c 4 = t 2 , 
an equation in integers x,y,z of the form 

x 4 + y 4 = z 2 . 

This is Eq. (1.5), and Theorem 1.7 says that this cannot occur. ■ 

So, not every positive integer is the area of a right triangle with rational 
side-lengths. 

Definition. A congruent number is a positive integer n that is the area of a 
right triangle having rational side-lengths. 


Theorem 1.9 says that 2 is not a congruent number. Using similar ideas, 
Fermat showed that 1 is not a congruent number (Exercise 1.31 below). 

One way to generate congruent numbers is to scale a Pythagorean triple 
using the largest perfect square that divides its area. For example, the area of 
A (7, 24, 25) is 84 = 2 2 • 21. Since 4 = 2 2 is the largest perfect square in 84, 
scaling the sides by 2 will produce a triangle of area 21, so that 21 is the area of 
A(j. 12, ) and, hence, 21 is a congruent number. More generally, we have 

Proposition 1.10. Let ( a,b,c ) be a Pythagorean triple. If its right triangle 
A (a,b, c ) has area m 2 n, where n is squarefree, then n is a congruent number. 
Moreover, every squarefree congruent number is obtained in this way. 


We have already used this 
method on the Pythag- 
orean triple (9,40, 41) 
when we showed that 5 is 
a congruent number. 


Proof. Since (a, b, c) is a Pythagorean triple, A = A(o, b, c) is a right trian- 
gle. Now area(A) = m 2 n = i ab , so that 


i(A( 


m 7 m 7 m 


)') = !(' £_A'\ = 

' f 1 \ m m J 


= 


and so n is a congruent number. 
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This triangle was found 
by Don Zagier, using 
sophisticated techniques 
investigating elliptic curves, 
and using a substantial 
amount of computer power 
(see [19] for more details). 


A readable account of 
the congruent prob- 
lem, with more exam- 
ples than we provide 
here, can be found at 

www .math . uconn . edu/ 
-kconr ad/blurbs/ 


Conversely, if n is a square-free congruent number, then there are rational 
numbers r, ,v, and t so that 

r 2 + s 2 = t 2 
\rs = n. 

Clearing denominators, we find integers a, b , c, and m so that 

a 2 + b z = c 2 

^ab = m 2 n. ■ 

The first few congruent numbers are 

5, 6, 7, 13, 14, 15, 20, 21, 22, 23. 

In light of Exercise 1.33 on page 20, we now have a method for determin- 
ing all congruent numbers: generate the areas of all Pythagorean triangles (we 
know how to do that), and then divide out its largest perfect square factor: case 
closed. 

Not quite. The trouble with this method is that you have no idea how many 
triangle areas to calculate before (if ever) you get to an area of m 2 n for a 
particular n . For some congruent numbers, it takes a long time. For example, 
157 is a congruent number, but the smallest rational right triangle with area 
157 has side lengths 

22440351 77043369699245575 1 3090674863 1 60948472041 
8912332268928859588025535178967163570016480830 ’ 

6803294847826435051217540 411340519227716149383203 

411340519227716149383203 ’ 21666555693714761309610 ' 

A method for effectively determining whether or not an integer is a congru- 
ent number is an unsolved problem (this problem is at least a thousand years 
old, for historians have found it in manuscripts dating from the late tenth cen- 
tury). A detailed discussion of the Congruent Number Problem is in [19]. 


How to Think About It. Proposition 1.10 shows that every squarefree con- 
gruent number n is the area of a scaled Pythagorean triangle. But there might 
be more than one Pythagorean triangle whose area has n as its squarefree 
part. The search for more than one rational right triangle with the same area 
leads to some fantastic calculations. For example, we saw that 5 is the area of 
A (|, ^-), which comes from the Pythagorean triangle A(9, 40, 41) whose 

area is 5 • 6 2 . But 5 is also the area of 


A 


1519 4920 3344161 ^ 
492 ’ 1519’ 747348 )' 


and this comes from the Pythagorean triangle A(2420640, 2307361, 3344161) 
whose area is 5 • 747348 2 . 

As usual, this isn’t magic; in Chapter 9, we’ll show how to find infinitely 
many rational right triangles with the same congruent number as area. 
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There’s a surprising connection between congruent numbers and 3-term 
arithmetic sequences of perfect squares of rational numbers: positive rationals 
s 2 < t 2 < u 2 with u 2 — t 2 = t 2 — s 2 . like 1, 25, 49 (common difference 24) 
and 

961 1681 2401 

"36"’ 36~ ’ ~ 36~ 

(common difference 20). Note that 24 = 4-6 and 20 = 4-5. So, for these 
examples, at least, the common difference is 4 times a congruent number. This 
suggests that something’s going on. One approach is due to Fibonacci. 

Our two equations 

a 2 + b 2 = c 2 
ab = 2« 

might lead us to think that we could find a and b by finding their sum and 
product, for this would lead to a quadratic equation whose roots are a and b. 
Well, we know ab , and 

(a + b) 2 = a 2 + b 2 + lab = c 2 + An. 

So, a+b = Vc 2 4- 4/7 (why can we take the positive square root?), and hence 
a and h are roots of the quadratic equation 

x 2 — s/ C 2 + 4/7 X + 2/7 . 

The quadratic formula gives us a and b: 

Vc 2 + 4/7 + J (c 2 + 4/7 ) — 4(2/7) Vc 2 + 4/7 + Vc 2 — 4/7 

a = 1 = 

2 2 

and 

Vc 2 + 4/7 — \J(c 2 + An) — 4(2/7) Vc 2 + 4/7 — Vc 2 — 4/7 
~~ 2 “ 2 ' 

But we want a and b to be rational, so we want c 2 ±4/7 to be perfect squares. 
That produces an arithmetic sequence of three perfect squares: 

c 2 — 4/7, c 2 , c 2 + 4/7. 

There are details to settle, but that’s the gist of the proof of the following theo- 
rem. 

Theorem 1.11. An integer n is a congruent number if and only of there is a 
3-term arithmetic sequence of perfect squares whose common difference is An. 


Exercises 

1.30 * Show that 1 is not a congruent number. 

1.31 Show that there are no positive rational numbers .v and y so that 

x 4 ± 1 = y 2 . 




20 Chapter 1 Early Number Theory 


1.32 Show that if n is a congruent number and m is an integer, then m 2 n is also a 
congruent number. 

1.33 Show that there are no right triangles with rational side-lengths whose area is a 
perfect square or twice a perfect square. 

1.34 Show that 7 and 14 are congruent numbers. 

1.35 Take It Further. Show that 13 is a congruent number. 

1.36 * Prove Theorem 1.11. 

1.3 Euclid 

Euclid of Alexandria (ca. 325 BCE-ca.265 BCE) is one of the most prominent 
mathematicians of antiquity. He is best known for The Elements , his treatise 
consisting of thirteen books: six on plane geometry, four on number theory, 
and three on solid geometry. The Elements has been used for over two thou- 
sand years, which must make Euclid the leading mathematics teacher of all 
time. We do not know much about Euclid himself other than that he taught in 
Alexandria, Egypt around 270 BCE. We quote from Sir Thomas Heath [16], 
the great translator and commentator on The Elements. 

It is most probable that Euclid received his mathematical training in 
Athens from the students of Plato', for most of the geometers who could 
have taught him were of that school ... Pappus says ... such was ( Euclid’s ) 
scrupulous fairness and his exemplary kindliness towards all who could 
advance mathematical science to however small an extent ; (he was) in 
no wise contentious and, though exact, yet no braggart. 

Eight hundred years after Euclid, Proclus (412 C 1-485 CE) wrote: 

Not much younger than these (pupils of Plato) is Euclid, who put to- 
gether The Elements, collecting many of Eudoxus ’s theorems, perfecting 
many of Theaetetus’s, and also bringing to irrefragable demonstration 
the things which were only somewhat loosely proved by his predecessors. 
This man lived in the time of the first Ptolemy (323 BCE — 283 BCE). For 
Archimedes, who came immediately after the first Ptolemy makes men- 
tion of Euclid', and further they say that Ptolemy once asked him if there 
were a shorter way to study geometry than The Elements, to which he 
replied that there was no royal road to geometry. He is therefore younger 
than Plato’s circle, but older than Eratosthenes and Archimedes ; for 
these were contemporaries, as Eratosthenes somewhere says. 

The Elements is remarkable for the clarity with which its theorems are stated 
and proved. The standard of rigor was a goal (rarely achieved!) for the inven- 
tors of calculus centuries later. As Heath writes in the preface to the second 
edition of his translation [16] of The Elements, 

... so long as mathematics is studied, mathematicians will find it neces- 
sary and worthwhile to come back again and again ... to the twenty-two- 
centuries-old book which, notwithstanding its imperfections, remains the 
greatest elementary textbook in mathematics that the world is privileged 
to possess. 


Pappus (ca. 290 ce- 
ca. 350 CE), was one 
of the last great classic 
geometers. 
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More than one thousand editions of The Elements have been published since 
it was first printed in 1482. In the Encyclopedia Britannica, van der Waerden 
wrote, 

Almost from the time of its writing and lasting almost to the present, 
The Elements has exerted a continuous and major influence on human 
affairs. It was the primary source of geometric reasoning, theorems, and 
methods at least until the advent of non-Euclidean geometry in the 19 th 
century. It is sometimes said that, next to the Bible, The Elements may 
be the most translated, published, and studied of all the books produced 
in the Western world. 

Greek Number Theory 

In spite of the glowing reviews of The Elements , we must deviate a bit from 
Euclid, for the Greeks, and Euclid in particular, recognized neither negative 
numbers nor zero. 


Notation. The natural numbers is the set 

N = {0, 1,2,3,...}. 

The set of all integers, positive, negative, and 0, is denoted by 

Z = {±n : n e N}. 

We are going to assume that the set N of natural numbers satisfies a certain 
property — a generalized version of Infinite Descent. 


The set of integers is 
denoted by Z because the 
German word for numbers 
is Zahlen. 


Definition. The Least Integer Axiom (often called the Well-Ordering Axiom) 
states that every nonempty collection C of natural numbers contains a smallest 
element; that is, there is a number Co g C with Co < c for all c e C . 

This axiom is surely plausible. If 0 e C, then cq = 0. If 0 f C and 1 e C, 
then Co = 1. If 0, 1 f C and 2 e C, then Co = 2. Since C is not empty, you 
will eventually bump into C, and Co is the first number you’ll meet. 

We now define some familiar terms. 


Note that the set of positive 
rationals <Q>+ does not 
satisfy an analogous 
property: the nonempty 
subset {* e Q+ : x 2 > 2} 
contains no smallest 
element. 


Definition. If a and b are integers, then a divides b, denoted by 

a | b, 

if there is an integer c with b = ca. We also say that a is a divisor (or a factor ) 
of b, and that b is a multiple of a. 

Example 1.12. Consider some special cases. Every number a divides itself, 
for a = a x 1; similarly, 1 divides every number. Every number a divides 0: 
taking c = 0, we have 0 = a x 0. On the other hand, if 0 divides b, then b = 0, 
for b = 0 x c = 0. Note that 3 | 6, because 6 = 3x2, but 3 } 5 (that is, 3 does 
not divide 5): even though 5 = 3 x | , the fraction | is not an integer. ▲ 

Lemma 1.13. If a and b are positive integers and a \ b, then a <b. 


Note that 0 divides itself: 

0 | 0 is true. Do not confuse 
the notation a \ b, which is 
the relation “a is a divisor 
of b” with a lb, which is a 
number. In particular, we 
are not saying that 0/0 is a 
number. 
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Inequalities are discussed 
in Appendix A. 4. 


We allow products having 
only one factor; it’s okay to 
say that a single prime is a 
product of primes. 


Proof. There is a positive integer c with b = ca; note that 1 < c, for 1 is 
the smallest positive integer. Multiplying by the positive number a, we have 
a < ac = b. ■ 

Every integer a has 1,-1 .a, —a as divisors. A positive integer a / 1 having 
only these divisors is called prime. 

Definition. An integer a is prime if a > 2 and its only divisors are ± 1 and 
±a; if a > 2 has other divisors, then it is called composite. 

The first few primes are 2, 3, 5, 7, 11, 13 We will soon see that there 

are infinitely many primes. 

The reason we do not consider 1 to be a prime is that theorems about primes 
would then require special cases treating the behavior of 1 . For example, we 
will prove later that every positive integer a >2 has exactly one factorization 
of the form a = piP 2 mm - Pu where p\ < p 2 < • • • < pt are primes. This 
statement would be more complicated if we allowed 1 to be a prime. 

Proposition 1.14. Every integer a > 2 is a product of primes. 

Proof. Let C be the set of all natural numbers a > 2 that are not products of 
primes. If the proposition is false, then C is nonempty, and the Least Integer 
Axiom gives a smallest such integer, say, Co- Since Co G C, it is not prime; 
hence, it factors, say, Co = cib, where a.b 1. As a \ cq, we have a < cq, by 
Lemma 1.13; but a f Co, lest b = 1, so that a < cq. Therefore, a ^ C. for 
Co is the smallest number in C, and so a is a product of primes; a = p \ ■ ■ ■ p m 
for m > 1. Similarly, b is a product of primes: b = q\---q n . Therefore, 
Co = ab = p\ ■ ■ • p m q\ • • -qn is a product of primes, a contradiction, and so 
C is empty. ■ 

Division and Remainders 

Dividing an integer b by a positive integer a gives 

b/a = q + r/a , 

where q is an integer and 0<r/o<l.lfwe clear denominators, we get the 
statement b = qa + r which involves only integers. For example, ^ = 4 + | 
becomes 22 = 4 • 5 + 2. 

b = 4a + r 

A A A A 

• • • • © • 

« a a ►-« a a ►«- r-+ 

* B # 

• - b 

Figure 1 .8. Division Algorithm. 

Euclid viewed division geometrically, as in Figure 1.8. Suppose B is a line 
segment of length b , and that A is a shorter segment of length a . Lay off copies 
of A along B as long as possible. If there’s nothing left over, then a is a divisor 
of b; if some segment of length, say r, is left over, then r is the remainder. 
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Theorem 1.15 (Division Algorithm). If a and b are positive integers, then 
there are unique (i.e., exactly one) integers q ( the quotient) and r (the remain- 
der) with 


b = qa + r and 0 < r < a. 


Proof. We first prove that q and r exist; afterward, we’ll prove their unique- 
ness. 

If b < a, set q = 0 and r = b. Now b = qa + r = 0 ■ a + b, while 
0 < b < a. Hence, we may assume that b > a; that is, b — a > 0. Consider the 
sequence b > b — a > b — 2a > b — 3a >••• . There must be an integer q > 1 
with b—qa > 0 and£> — (q+ l)a < 0 (this is just Infinite Descent, described on 
page 12; in more down-to-earth language, there can be at most b steps before 
this sequence becomes negative). If we define r = b — qa, then b = qa + r . 
We also have the desired inequalities. Clearly, 0 < r. If r = b — qa > a, then 
b — qa — a > 0; that is, b — (q + 1 )a > 0, contradicting the definition of q. 

Let’s prove uniqueness. If there are integers Q and R with b = Qa + R 
and 0 < R < a, then qa + r = b = Qa + R and 


(Q - q)a = r - R. 


The hypothesis of Theo- 
rem 1.15 can be weakened 
to a , h € Z and a f 0; the 
inequalities for the remain- 
der now read 0 < r < \a\. 


If a < b, the quotient q is 
the largest multiple qa with 
qa < b. This is very much 
the way young children are 
taught to find the integer 
quotient in division when a 
and b are small. 


If Q f q, there is no loss in generality in assuming that Q > q; that is, 
0 < Q — q. By Lemma 1 . 13 , a < (Q — q)a = r — R. But r < a and R > 0 
gives r — R < r < a. Therefore, a <(Q — q)a = r — R < a; that is, a < a, 
a contradiction. Hence Q = q . It follows that R = r, and we are done. ■ 

For example, there are only two possible remainders after dividing by 2, 
namely, 0 and 1. An integer b is even if the remainder is 0; b is odd if the 
remainder is 1. Thus, either b = 2q or b = 2q + 1 . 

The equation b = qa + r is of no value at all without the restriction on 
the remainder r. For example, the equations 1000 = 3 • 25 + 925 and 1000 = 
2-53 -1- 894 are true and useless. 


How to Think About It. We have been trained to regard the quotient q as 
more important than the remainder; r is just the little bit left over. But our 
viewpoint now is just the reverse. Given a and b, the important question for us 
is whether a is a divisor of b. The remainder is the obstruction: a h if and 
only if r = 0. This will be a common strategy: to see whether a \ b, use the 
Division Algorithm to get b = qa + r, and then try to show that r = 0. 


The next result shows that there is no largest prime. The proof shows, given 
any finite set of primes, that there always exists another one. 

Corollary 1.16. There are infinitely many primes. 

Proof. (Euclid) Suppose, on the contrary, that there are only finitely many 
primes. If p\ , pi, . . . , Pk is the complete list of all the primes, define 


M = (pi ■■■pk) + L 

By Proposition 1.14, M is a product of primes. But M has no prime divisor 
Pi , for dividing M by p, gives remainder 1 and not 0. For example, dividing 
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See Exercise 1 .47 on 
page 30. 


M by pi gives M = p\(p 2 • • • Pk) + 1, so that the quotient and remainder are 
q = P 2 • • • Pk and r = 1; dividing M by p 2 gives M = p 2 {p\ Pi--- Pk) + 1, 
so that q = p\Pi - - - Pk and r = 1; and so forth. The assumption that there 
are only finitely many primes leads to a contradiction, and so there must be an 
infinite number of them. ■ 


Linear Combinations and Euclid’s Lemma 

The greatest common divisor of two integers is a fundamental tool in studying 
factorization. 


Definition. A common divisor of integers a and b is an integer c with c \ a 
and c | b. The greatest common divisor of a and h, denoted by gcd(a, b) (or, 
more briefly, by (a, b) ), is defined by 

0 if a = 0 = b 

the largest common divisor of a and b otherwise. 

We saw, in Lemma 1.13, that if a and m are positive integers with a \ m, 
then a < in. It follows, if at least one of a. b is not zero, that gcd’s exist: there 
are always common divisors (1 is always a common divisor), and there are 
only finitely many positive common divisors < max{|a|, |&|}. 


gcd(«, b) = 


Lemma 1.17. If p is a prime and b is an integer, then 


gcd (p, b ) 


P tf P \ b 
1 otherwise. 


Proof. A common divisor c of p and b is, in particular, a divisor of p. But the 
only positive divisors of p are p and 1, and so gcd(/>, b) = p or 1; it is p if 
p | b, and it is 1 otherwise. ■ 

If b > 0, then gcd(0, b) = b (why?). 


Definition. A linear combination of integers a and b is an integer of the form 

sa + tb, 

where s, t e Z (the numbers s, t are allowed to be negative). 


Example 1.18. The equation b = qa + r in the Division Algorithm displays 
b as a linear combination of a and r (for b = qa + \ ■ r). Note that 0 is a linear 
combination of any pair of integers: 0 = 0 • a + 0 • b. There are infinitely many 
linear combinations of 12 and 16, each of which is divisible by 4 (why?). It 
follows that 5, for example, is not such a linear combination. ▲ 


The next result is one of the most useful properties of gcd’s. 

Theorem 1.19. If a and b are integers, then gcd (a, b) is a linear combination 
of a and b. 
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Proof. We may assume that at least one of a and b is not zero (otherwise, 
the gcd is 0 and the result is obvious). Consider the set / of all the linear 
combinations of a and b : 


I = { sa + tb : s,t e Z}. 

Both a and b are in I (take 5=1 and t = 0 or vice versa). It follows that 
/ contains positive integers (if a < 0, then —a is positive, and / contains 
—a = (— \)a + Ob); hence, the set C of all those positive integers lying in I is 
nonempty. By the Least Integer Axiom, C contains a smallest positive integer, 
say, d ; we claim that d is the gcd. 

Since d e 7, it is a linear combination of a and b: there are integers s and 
t with 


d = sa + tb. 

We’ll show that d is a common divisor by trying to divide each of a and b 
by d . The Division Algorithm gives integers q and r with a = qd + r, where 
0 < r < d . If r > 0, then 

r = a — qd = a — q(sa + tb) = (1 — qs)a + ( —qt)b € C, 

contradicting d being the smallest element of C . Hence r = 0 and d \ a; a 
similar argument shows that d \ b. 

Finally, if c is a common divisor of a and b, then Exercise 1.46 on page 29 
shows that c divides every linear combination of a and b; in particular, c \ d . 
By Lemma 1.13, we have c < d . ■ 

If d = gcd(a, b) and if c is a common divisor of a and b, then c < d , by 
Lemma 1.13. The next corollary shows that more is true: c is a divisor of d ; 
that is, c | d for every common divisor c. 

Corollary 1.20. Let a and b be integers. A nonnegative common divisor d is 
their gcd if and only if c \ d for every common divisor c of a and b. 

Proof. Necessity (the implication =>). We showed that any common divisor of 
a and h divides gcd (a, b) at the end of the proof of Theorem 1.19. 

Sufficiency (the implication 4=). Let d = gcd (a,b), and let I) > 0 be a 
common divisor of a and b with c \ D for every common divisor c of a and 
b. Now D is a common divisor, so that d \ D, by hypothesis; hence, d < D, 
by Lemma 1.13. But the definition of gcd (cl is the greatest common divisor) 
gives D < d, and so D = cl . ■ 

The next theorem is of great interest: not only is it very useful, but it also 
characterizes prime numbers. 

Theorem 1.21 (Euclid’s Lemma). If p is a prime and p \ ab for integers 
a,b, then p \ a or p \ b. Conversely, ifm > 2 is an integer such that m \ ab 
always implies m \ a orm \ b, then m is a prime. 

Proof. (=>): Suppose that p \ ab and that p \ a; that is, p does not divide a; 
we must show that p \ b. Since gcd (p, a) = 1 (by Lemma 1.17), Theorem 1.19 
gives integers s and t with 1 = sp + ta. Hence, 


The proof of Theorem 1.19 
contains an idea that 
will be used again, as in 
Exercise 1 .49 on page 30. 

In other words, d is the 
smallest positive linear 
combination of a and b. 


In some treatments of num- 
ber theory, Corollary 1 .20 
is taken as the definition 
of gcd. Later, we will want 
to define greatest common 
divisor in other algebraic 
structures. It often will not 
make sense to say that one 
element of such a structure 
is greater than another, but 
it will make sense to say 
that one element divides 
another. Corollary 1.20 
will allow us to extend the 
notion of gcd. 


b = spb + tab. 
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The contrapositive of a 
statement P implies 
Q is “not Q" implies 
“not P”. For example, the 
contrapositive of “If I live 
in Chicago, then I live in 
Illinois” is “If I don’t live 
in Illinois, then I don’t live 
in Chicago.” A statement 
and its contrapositive are 
either both true or both 
false. Thus, to prove a 
statement, it suffices to 
prove its contrapositive. 


Now p divides both expressions on the right, for p \ ab, and so p \ b, by 
Exercise 1.46 on page 29. 

(-f=): We prove the contrapositive. If m is composite, then m = ab, where 
a < m and b < m. Now m | m = ab, but m \ a and m \ b, by Lemma 1.13. 
Thus, m divides a product but it divides neither factor. ■ 

To illustrate: 6 | 12 and 12=4x3, but 6 \ 4 and 6 j 3. Of course, 6 is not 
prime. 

We will generalize Euclid’s Lemma in the next chapter. Theorem 2.8 says 
that if p is a prime and p \ a\ ■ ■ -a n for integers ai, ... ,a n , where n > 2, then 
p | aj for some i . 

Definition. Call integers a and b relatively prime if their gcd is 1 . 

Thus, a and b are relatively prime if their only common divisors are ± 1 . 
For example, 2 and 3 are relatively prime, as are 8 and 15. 

Here is a generalization of Euclid’s Lemma having the same proof. 


Corollary 1.22. Let a , b, and c be integers. If c and a are relatively prime and 
c | ab, then c \ b. 


Proof. Theorem 1.19 gives integers s and t with 1 = sc + ta. Hence, b = 
scb + tab. Now c divides both expressions on the right, for c \ ab, and so 
c | b, by Exercise 1.46 on page 29. ■ 


How to Think About It. 

We have just seen one reason why it is important to know proofs: Corol- 
lary 1.22 does not follow from the statement of Euclid’s Lemma, but it does 
follow from its proof. See Exercise 1 .54 on page 34 for another example of 
this. 


Proposition 1.23. Let a and b be integers. 

(i) gcd(a, b) = 1 ( that is, a and b are relatively prime) if and only if 1 is a 
linear combination of a and b. 

(ii) Ifd = gcd (a,b) rf 0, then the integers a / d andb/d are relatively prime. 

Proof, (i) By Theorem 1.19, the gcd d is a linear combination; here, d = 1. 
Conversely, if 1 = sa + tb and c is a common divisor of a and b, then 
c | 1, by Exercise 1.46 on page 29. Hence, c = ±1. 

(ii) There are integers s and t with d = sa + tb. Divide both sides by d = 
gcd(a, b): 



Since d is a common divisor, both a / d and b/ d are integers, and part (i) 
applies. ■ 


Definition. An expression a/b for a rational number (where a and b are inte- 
gers and h f 0) is in lowest terms if a and b are relatively prime. 




Proposition 1.24. Every nonzero rational number a / b has an expression in 
lowest terms. 


Proof. If d = gcd(a, b), then a = a'd, b = b'd, and — = 

d b 

a ' = — and b’ = — , so gcd (a', b') = 1 by Lemma 1.23. ■ 

d d 


a’d 

Vd 


a „ 

— . But 

b' 


We can now complete our discussion of Pythagorean triples. 

Definition. A Pythagorean triple (a, b, c ) is primitive if a, b, c have no com- 
mon divisor d > 2; that is, there is no integer d > 2 which divides each of 
a, b, and c. 

Theorem 1.25 (Diophantus). Every Pythagorean triple ( a , b. c) is similar to 
a unique primitive Pythagorean triple. 

Proof. We show first that {a, b, c) is similar to a primitive Pythagorean triple. 
If d is a common divisor of a, b, c, then a = du, b = d v , and c = dz , and 
(u, v, z) is a Pythagorean triple similar to ( a , b , c) (why?). If d is the largest 
common divisor of a, b, c , we claim that (u, v, z) is primitive. Otherwise, there 
is an integer e > 2 with u = eu ', v = ev', and z = ez hence, a = du = 
deu' , b = dv = dev', and c = dz = dez' . Thus, de > d is a common 
divisor of a, b , c, contradicting d being the largest such. 

To prove uniqueness, suppose that (a, b , c) is similar to two primitive Py- 
thagorean triples, say (w, v, z) and (r, s, t). It follows that the right triangles 
A (u, v,z ) and A (r,s,t) are similar, and so their sides are proportional, so 
there is some positive number h with 

u = hr, v = hs , and z = lit. 

Since the side lengths are integers, h is rational, say h = m/l, and we may 
assume that it is in lowest terms; that is, gcd(/n, l) = 1. Cross multiply: 

mu = Ir, mv = Is and mz = It. 

By Corollary 1.22, £ is a common divisor of u, v, and z and m is a common 
divisor of r, s, and t. Since both (u, v, z) and (r, s, t ) are primitive, l = 1 = m, 
and so (u, v , z) = (r, s, t). ■ 

This next result is significant in the history of mathematics. 

Proposition 1.26. There is no rational number a /b whose square is 2. 

Proof. Suppose, on the contrary, that (ci/b) 2 = 2. We may assume that a/b is 
in lowest terms; that is, gcd(a, b) = 1. Since a 2 = 2b 2 , Euclid’s Lemma gives 
2 | a, and so 2m = a. Hence, 4m 2 = a 2 = 2b 2 , and 2 m 2 = b 2 . Euclid’s 
Lemma now gives 2 | b, contradicting gcd (a. b) = 1. ■ 


It follows that the legs of a Pythagorean triple (a, b. c) cannot be equal, for 
if a = b, then a 2 + a 2 = c 2 , which implies that 2 = (c/a) 2 . 
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An indirect proof or proof 
by contradiction has the 
following structure. We 
assume that the desired 
statement is false and 
reach a contradiction. We 
conclude that the original 
statement must be true. 
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Proposition 1.26 is often stated as “\/2 is irrational,” which is a stronger 
statement than what we’ve just proved. We can assert that \fl is irrational only 
if we further assume that there exists a number u with u 2 = 2. 

Our proof can be made more elementary; we need assume only that at least 
one of a, b is odd. Also, see Exercise 1.75 on page 41. 


To bridge the gap between 
numbers and geometric 
magnitudes, Eudoxus 
(408 bce-355 bce) intro- 
duced the sophisticated 
notion of proportions (this 
idea, discussed in The Ele- 
ments, is equivalent to our 
contemporary definition of 
real numbers). 


Historical Note. The ancient Greeks defined number to mean “positive inte- 
ger.” Rationals were not viewed as numbers but, rather, as ways of comparing 
two lengths. They called two segments of lengths a and b commensurable 
if there is a third segment of length c with a = me and b = nc for posi- 
tive integers m and n . That \fl is irrational was a shock to the Pythagoreans 
(ca. 500 BCE); given a square with sides of length 1, its diagonal and side 
are not commensurable; that is, \fl cannot be defined in terms of numbers 
(positive integers) alone. Thus, there is no numerical solution to the equation 
x 2 = 2, but there is a geometric solution. 

By the time of Euclid, around 270 BCE, this problem had been resolved by 
splitting mathematics into two disciplines: number theory and geometry. 

In ancient Greece, algebra as we know it did not really exist. Euclid and 
the Greek mathematicians did geometric algebra. For simple ideas, e.g., 
(a + b ) 2 = a 2 + lab + b 2 or completing the square, geometry clarifies al- 
gebraic formulas (for example, see the right-hand part of Figure 1 .2 on page 2 
without the dashed lines). For more difficult ideas, say equations of higher de- 
gree, the geometric figures involved are very complicated, so that geometry is 
no longer clarifying. As van der Waerden writes in [34], p. 266, 

one has to be a mathematician of genius, thoroughly versed in trans- 
forming proportions with the aid of geometric figures, to obtain results 
by this extremely cumbersome method. Anyone can use our algebraic 
notation, but only a gifted mathematician can deal with the Greek theory 
of proportions and with geometric algebra. 

The problem of defining number has arisen several times since the classical 
Greek era. Mathematicians had to deal with negative numbers and with com- 
plex numbers in the 1500s after the discovery of the Cubic Formula, because 
that formula often gives real roots of a cubic polynomial, even integer roots, in 
unrecognizable form (see Chapter 3). The definition of real numbers generally 
accepted today dates from the late 1800s. But there are echos of ancient Athens 
in our time. Kronecker (1823-1891) wrote, 

Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Men- 
schenwerk. (God created the integers; everything else is the work of 
Man.) 

Even today some logicians argue for a new definition of number. 


Exercises 

1.37 True or false, with reasons. Of course, it is important to get the right answer, but 
most attention should be paid to your reasoning. 

(i) 6 | 2. Answer. False. (ii) 2 | 6. Answer. True. 

(iii) 6 | 0. Answer. True. (iv) 0 | 6. Answer. False. 

(v) 0 | 0. Answer. Tme. 
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1.38 True or false, with reasons. 

(i) gcd(«,« + 1) = 1 for every natural number n . Answer. True. 

(ii) gcd (n,n + 2) = 2 for every natural number n. Answer. False. 

(iii) 1 13 is a sum of distinct powers of 2. Answer. True. 

(iv) If a and b are natural numbers, there there are natural numbers s and t with 
gcd (a,b) = sa + tb. Answer. False. 

(v) If an integer m is a divisor of a product of integers ab , then m is a divisor of 
either a orb (or both). Answer. False. 

1.39 Prove, or disprove and salvage if possible. 


(i) 

gcd(0, b) = 

b 

(ii) 

gcd (a 2 , b 2 ) 

= gcd(a,?>) 2 

(iii) 

gcd (a,b) = 

gcd(a, b + ka ) 

(iv) 

gcd (a, a) = 

a 

(v) 

gcd(a, b ) = 

gcd (b,a) 

(vi) 

gcd(a, 1) = 

1 

(vii) 

gcd (a,b) = 

— gcd(— a, b ) 

(viii) 

gcd(a, 2b) - 

= 2 gcd (a,b) 


1.40 * If x is a real number, let |_xj denote the greatest integer n with n < x. (For 
example, 3 = \n J and 5 = |_5J .) If q is the quotient in Theorem 1.15, show that 
q = [ b/a\ . 

1.41 * 

(i) Given integers a and b (possibly negative) with a ^ 0, prove that there exist 
unique integers q and r with b = qa + r and 0 < /' < \a\. 

Hint. Use the portion of the Division Algorithm that has already been proved. 

(ii) If b and a are positive integers, do b and —b have the same remainder after 
dividing by a? Answer. No. 

1.42 For each of the following pairs a,b, find the largest nonnegative integer n with 
n < b/a < n + 1. 

(i) a = 4 and b = 5. Answer, n = 1. 

(ii) a = 5 and b = 4. Answer, n = 0. 

(iii) a = 16 and b = 36. Answer, n = 2. 

(iv) a = 36 and b = 124. Answer, n = 3. 

(v) a = 124 and b = 1028. Answer, n = 1. 

1.43 Let pi , p 2 , P 3 , • . . be the list of the primes in ascending order: p\ = 2, p 2 = 3, 

P 2 = 5, and so forth. Define = 1 + p\P 2 • • • Pk f° r k >\. Find the smallest 

k for which is not a prime. 

Hint. 19 | fy, but 7 is not the smallest k. 

1.44 What can you say about two integers a and b with the property that a \ b and 
b | a? What if both a and b are positive? 

1.45 * Show that if a is positive and a \ b, then gcd(a, b) = a. Why do we assume 
that a is positive? 

1.46 *(Two Out of Three). Suppose that m, n, and q are integers and m = n + q. If c 
is an integer that divides any two of m,n,q, show that c divides the third one as 
well. 


“Disprove” here means 
“give a concrete counterex- 
ample.” “Salvage” means 
“add a hypothesis to make 
it true.” 
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Allow for positive and 
negative values of s and t. 


1.47 * 

(i) For each a and b. give the smallest positive integer <7 that can be written as 
sa + tb for integers s and t : 

• a = 12 and b = 16. Answer, d = 4. 

• a = 12 and b = 17. Answer, d = 1. 

• a = 12 and b = 36. Answer, d = 12. 

• a = 0 and b = 4. Answer. <7 = 4. 

• a = 4 and b = 16. Answer. <7 = 4. 

• a = 16 and b = 36. Answer, d = 4. 

• a = 36 and h = 124. Answer. <7 = 4. 

• a = 124 and b = 1028. Answer. <7 = 4. 

(ii) How is “smallest positive integer <7 expressible as sa + tb” related to a and b 
in each case? Is d a divisor of both a and b7 

1.48 * Show that the set of all linear combinations of two integers is precisely the set 
of all multiples of their gcd. 

1.49 * Let 7 be a subset of Z such that 

(i) 0 e I 

(ii) if a , b e I , then a — b e I 

(iii) if a 6 / and q e Z, then qa e I . 

Prove that there is a nonnegative integer <7 e I with / consisting precisely of all 
the multiples of <7. 

1.50 How might one define the gcd (a.b, c) of three integers? When applied to a prim- 
itive Pythagorean triple {a, b, c), your definition should say that gcd(a, b, c) = 1. 

Euclidean Algorithm 

Our discussion of gcd’s is incomplete. What is gcd(12327, 2409)? To ask the 
question another way, is the expression 2409/12327 in lowest terms? The next 
result enables us to compute gcd’s efficiently. We first prove another lemma 
from Greek times. 

Lemma 1.27. Let a and b be integers. 

(i) Ifb = qa + r, then gcd(a, b) = gcd(r, a). 

(ii) Ifb > a , then gcd(a, b) = gcd (b — a, a). 

Proof, (i) In light of Corollary 1 .20, it suffices to show that an integer c is a 
common divisor of a and b if and only if it is a common divisor of a and 
r. Since b = qa + r, this follows from Exercise 1.46 on page 29. 

(ii) This follows from part (i) because b = 1 • a + (b — a). ■ 

The hypothesis b > a in part (ii) of Lemma 1 .27 is not necessary; it is there 
only to put you in the mood to accept the next example showing a method 
the Greeks probably used to compute gcd’s. This method of computation is 
nowadays called the Euclidean Algorithm; it is Theorem 1.29. 

Example 1.28. In this example, we will abbreviate gcd (b, a) to (b, a). Com- 
puting ( b , a) is simple when a and b are small. If h > a, then Lemma 1.27 




allows us to replace ( b,a ) by ( b — a, a); indeed, we can continue replacing 
numbers, (b — 2a,a), (b — 3a,a), . ( b — qa,a ) as long as b—qa > 0. Since 
the natural numbers b — a,b — 2a,...,b — qa are strictly decreasing, the Least 
Integer Axiom (or Infinite Descent) says that they must reach a smallest such 
integer: r = b — qa; that is, 0 < r < a. Now ( b , a) = (r, a). (We see the proof 
of the Division Algorithm in this discussion.) Since ( r , a) = (a, r ) and a > r, 
they could continue replacing numbers: (a , r ) = (a — r, r ) = (a — 2r, r ) = 
(remember that the Greeks did not recognize negative numbers, so it was nat- 
ural for them to reverse direction). This process eventually ends, computing 
gcd’s; we call it the Euclidean Algorithm. The Greek term for this method is 
antanairesis , a free translation of which is “back and forth subtraction.” Let us 
implement this idea before we state and prove the Euclidean Algorithm. 

Antanairesis computes gcd(326, 78) as follows: 

(326, 78) = (248, 78) = (170, 78) = (92, 78) = (14, 78). 

So far, we have been subtracting 78 from the other larger numbers. At this 
point, we now start subtracting 14 (this is the reciprocal, direction-changing, 
aspect of antanairesis), for 78 > 14. 

(78, 14) = (64, 14) = (50, 14) = (36, 14) = (22, 14) = (8, 14). 

Again we change direction: 


(14,8) = (6,8). 

Change direction once again to get (8, 6) = (2, 6), and change direction one 
last time to get 


(6, 2) = (4, 2) = (2, 2) = (0, 2) = 2. 

Thus, gcd (326, 78) = 2. 

The Division Algorithm and Lemma 1.27(i) give a more efficient way of 
performing antanairesis. There are four subtractions in the passage from 
(326, 78) to (14, 78); the Division Algorithm expresses this as 

326 = 4-78 + 14. 

There are then five subtractions in the passage from (78, 14) to (8, 14); the 
Division Algorithm expresses this as 

78 = 5 - 14 + 8. 

There is one subtraction in the passage from (14, 8) to (6, 8): 

14 = 1-8 + 6. 

There is one subtraction in the passage from (8, 6) to (2, 6): 

8 = 1 - 6 + 2 , 


and there are three subtractions from (6, 2) to (0, 2) = 2: 
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The beginning of the proof 
of the theorem gives the 
algorithm. 


Lame (1795-1870) proved 
that the number of steps 
in the Euclidean Algorithm 
cannot exceed 5 times 
the number of digits in the 
smaller number (see [26], 
p. 49). 


Theorem 1.29 (Euclidean Algorithm I). If a and b are positive integers, 
there is an algorithm computing gcd(a, b). 


Proof. Let us set b = ro and a = r\ , so that the equation b = qa + r reads 
ro = qia + r 2 . There are integers q\ and positive integers r ; - such that 


b = r 0 = q\a + r 2 . 

r 2 

a = r 1 = q 2 r 2 + r 3 , 

l'3 

r 2 = q 2 r 2 + r 4 . 

r 4 

r n - 3 = q n - 2 r n - 2 + r n -i, 

r n - 1 

r, i- 2 = q n -i r n-t + r, ,, 

r n 

F n— 1 = tfnF n 



(remember that all qj and rj are explicitly known from the Division Algo- 
rithm). There is a last remainder: the procedure stops (by Infinite Descent!) 
because the remainders form a strictly decreasing sequence of nonnegative in- 
tegers (indeed, the number of steps needed is less than a). 

We now show that the last remainder r n is the gcd. 


b = q\a + r 2 


gcd(a, b) = gcd(a, r 2 ) 


a = q 2 r 2 + r . 3 


gcd(a, r 2 ) = gcd (r 2 , r 3 ) 


r 2 = q 2 r 2 + r 4 


gcd (r 2 , rf) = gcd(r 3 , r 4 ) 


r n - 2 = q n -ir n -\ + r n 

F n— 1 = Qn F n 


gcd(r„_ 2 . r n - 1 ) = gcd(r„_i ,r n ) 
gcd(r„_i ,r n ) = r n . 


All the implications except the last follow from Lemma 1.27. The last one 
follows from Exercise 1.45 on page 29. ■ 


Let’s rewrite the previous example in the notation of the proof of Theo- 
rem 1.29. The passage from one line to the line below it involves moving the 
boldface numbers “southwest.” 


326 = 4 • 78 + 14 

(L7) 

78 = 5 • 14 + 8 

(1.8) 

14 = 1 • 8 + 6 

(1-9) 

8=16+2 

(1.10) 

6=3-2. 



The Euclidean Algorithm also allows us to find a pair of integers 5 and t 
expressing the gcd as a linear combination. 

Theorem 1.30 (Euclidean Algorithm II). If a and b are positive integers, 
there is an algorithm computing a pair of integers s and t with gcd(a, b ) = 
sa + tb. 
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Proof. It suffices to show, given equations 

b = qa + r 
a = q'r + r' 
r = q"r' + r", 

how to write r" as a linear combination of b and a (why?). Start at the bottom, 
and write 


Now rewrite the middle equation as r' = a — q'r , and substitute: 

r" = r - q"r' = r - q"(a - q'r ) = (1 - q"q')r - q"a. 

Now rewrite the top equation as r = b — qa, and substitute: 

r" = (1 — q"q')r — q"a = (1 — q"q'){b — qa) — q"a. 

Thus, r" is a linear combination of h and a. ■ 

We use the equations to find coefficients s and t expressing 2 as a linear 
combination of 326 and 78. Work from the bottom up. 

2 = 8- 1-6 by Eq. (1.10) 

= 8 — 1 - (14 -1-8) byEq. (1.9) 

= 2-8-1-14 


= 2 - (78 — 5 • 14 ) — 1 - 14 byEq. (1.8) 

= 2- 78- 11 -14 

= 2-78- 11- (326 - 4 • 78) by Eq. (1.7) 

= 46-78- 11-326. 


Thus, s = 46 and t = —11. 


How to Think About It. The algorithm produces one pair of coefficients that 
works. However, it’s not the only pair. For example, consider gcd(2, 3) = 1. 
A moment’s thought gives s = — 1 and t = 1; but another moment’s thought 
gives s = 2 and t = — 1 (see Exercise 1.57 on page 35). However, the Eu- 
clidean Algorithm always produces a specific pair of coefficients; assuming 
that no mistakes in arithmetic are made, two people using the algorithm al- 
ways come up with the same s and t. 


Students usually encounter greatest common divisors in elementary school, 
sometimes as early as the fifth grade, when they learn how to add fractions 
and put the sum in lowest terms. As we have seen, putting a fraction in lowest 
terms involves the gcd of numerator and denominator. The preferred method 
of finding gcd’s in early grades involves prime factorization, for if integers a 
and b are small, then it is easy to factor them into primes: after several cancel- 
lations, the expression a/b is in lowest terms. Pedagogically, this may be the 
right choice, but finding gcd’s using prime factorization is practical only when 
numbers are small; can you put the fraction 167291/223377 in lowest terms 
using prime factorization? 


Putting a fraction in lowest 
terms is not always wise. 
For example, 


2 

1 


+ I = 10 + 3 
' 5 15 T 15 

_ 13 
_ T5- 
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How to Think About It. In calculating gcd’s with the Euclidean Algorithm, 
many students get confused keeping track of the divisors and remainders. We 
illustrate one way to organize the steps that has been effective with high school 
students. Arrange the steps computing gcd(124, 1028) as on the left: 


8 

124 J 1028 

992 3 


36 j 124 
' 108 2 


16 j 36 
' 32 4 

4 j 16 
' 16 


0 


4 = 36-2- 16 

\ 

= 36-2- (124-3- 36) 

= -2 • 124 + 7 • 36 

\ 

= -2 • 124 + 7 • (1028 - 8 • 124) 
= 7- 1028- 58- 124 


The last nonzero remainder is the gcd, so gcd(124, 1028) = 4. This arrange- 
ment can be used to read off coefficients s and t so that 4 = 124 s + 128?. Start 
at the next to last division and solve for each remainder. 


Exercises 

1.51 If a and b are positive integers, then gcd (a,b) = sci + tb. Prove that either s or t 
is negative. 

1.52 * Use Inhnite Descent to prove that every positive integer a has a factorization 
a = 2 k m, where k > 0 and m is odd. Now prove that V2 is irrational using this 
fact instead of Euclid’s Lemma. 

1.53 Prove that if n is squarefree (i.e., n > 1 and n is not divisible by the square of 
any prime), then there is no rational number x with x 2 = n. 

Hint. Adapt the proof of Proposition 1.26. 

1.54 * Assuming there is a real number x with x 3 = 2, prove that x is irrational. 

1.55 (i) Find d = gcd(326, 78), find integers s and t with d = 326 s + 78?, and put 

the expression 326/78 in lowest terms. 

Answer, d = 2, s = — 1 1 , / = 46, and . 

(ii) Find d = gcd( 12327,2409), find integers s and t with d = 12327^+2409?, 
and put the expression 2409/12327 in lowest terms. 

Answer, d = 3, s = 299, t = —1530, and 

(iii) Find d = gcd(7563, 526), and express d as a linear combination of 7563 and 
526. 

Answer, d = 1, s = —37, t = 532. 

(iv) Find d = gcd(73122, 7404621) and express d as a linear combination of 
73122 and 7404621. 


Answer, d = 21 , s = 3453 1 , / = -740462 1 . 
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1.56 * Prove that if gcd(t\ m) = 1 and gcd(/, m) = l,thengcd (rr',m) = 1. Con- 
clude that if both r and r' are relatively prime to m, then so is their product rr' . 
Hint. If ar + bm = 1 and sr' + tm = 1, consider (ar + bm)(sr' + tm ). 

1.57 * Let a, ft, and d be integers. If d = sa + tb, where x and t are integers, find 
infinitely many pairs of integers (s^.t^) with d = s^a + t^b. 

Hint If 2 .s + 3t = 1, then2(x + 3) + 3 (t - 2) = 1. 

1.58 * If a and b are relatively prime and each divides an integer n, prove that their 
product ab also divides n. 

Hint. Use Corollary 1.22. 

1.59 If m > 0, prove that m gcd( ft, c ) = gcd( mb, me). (We must assume that m > 0 
lest m ged {b, c) be negative.) 

Hint. Show that if k is a common divisor of mb and me, then k \ m ged (b, c). 

1.60 Write d = gcd(a, b) as a linear combination of a and b. 

(i) a = 4 and b = 16. 

Answer, d = 4=5-4+ (—1) • 16 (or, 4 = 1 • 4 + 0 • 16). 

(ii) a = 16 and b = 36. 

Answer, d = 4 = (—2) • 16 + 1 • 36. 

(iii) a = 36 and ft = 124. 

Answer, d = 4 = 7 • 36 + (—2) • 124. 

(iv) a = 124 and ft = 1028. 

Answer, d = 4 = (-58) • 124 + 7 • 1028. 

1.61 Given integers a, ft, and c with c | a and c \ ft, prove that c divides every linear 
combination sa + tb. 

1.62 Is anything wrong with this calculation? Explain your answer. 

4 

7j"37 

28 

9 

1.63 Given integers ft, c, d , and e satisfying ft = 7c + 2 and d = le + 4. 

(i) What’s the remainder when ft + d is divided by 7? 

Answer. 6. 

(ii) What’s the remainder when bd is divided by 7? 

Answer. 1. 

(iii) Explain your answers. 

1.64 A lattice point is a point (x, y) in the plane with both x and y integers. 

(i) Which lattice points are on the line whose equation is 4.r + 6y = 24? 

(ii) Which lattice points are on the line whose equation is 3x + 6y = 24? 

(iii) Find a line whose equation has integer coordinates but that never passes 
through a lattice point. 

(iv) Explain how to tell whether the line with equation y = a x + ft contains lattice 
points. 
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The functions can be pro- 
grammed into a calculator. 


Functions are discussed in 
Appendix A.1 . 


1.65 Consider the calculation of gcd( 124, 1028) on page 34. Show that the integer pairs 

(124,1028), (36,124), (16,36), (4,16), (0,4) 

have the same greatest common divisor. 

1.66 Most calculators have functions computing quotients and remainders. Let r(b , a) 
denote the remainder when b is divided by a , and let q(b, a) denote the quotient. 
Find r(b, a) and q(b,a) if 

(i) a = 12. b = 16. Answer. q(\6, 12) = 1 , r ( 1 6 , 12) = 4. 

(ii) a = 16, b = 12. Answer. ^(12, 16) = 0, r(12, 16) = 12. 

(iii) a = 124, b = 1028. Answer. ?(1028, 124) = 8, r( 1028, 24) = 36. 

(iv) a = 78, b = 326. Answer. q( 326, 78) = 4, r(326, 78) = 14. 

1.67 Preview. Using the notation in Exercise 1.66. consider the pair of recursively 
defined functions on N : 

0 

t (r(b,a),a) — q(b,a) • s (r(b,a),a) 

1 a = 0 

.s (r(b, a), a) a > 0. 

Find s(a,b) and t (a , b) if 

(i) a = 124, A = 1028. Answer. s(124, 1028) = -58, f (124, 1028) = 7. 

(ii) a = 36, b = 124. Answer. j(36, 124) = 7, t (36, 124) = -2. 

(iii) a =78, b = 326. Answer. «(78, 326) = 46, f (78,326) = -11. 

(iv) a = 12327, b = 2409. Answer. ^(1237, 2409) = 1 186, f (1237, 2409) = 
-609. 

(v) a = 7563, b = 526. Answer. ^(7563, 526) = -37, f (7563, 526) = 532. 

(vi) a = 167291, b = 223377. Answer. ^(167291, 223377) = -4, 
1(167291,223377) = 3. 

1.4 Nine Fundamental Properties 

We now focus on a small number (nine) of properties of arithmetic, for it turns 
out that many of the usual rules follow from them. This obviously simplifies 
things, making explicit what we are allowed to assume. But we have an ulterior 
motive. The properties will eventually be treated as axioms that will describe 
addition and multiplication in other systems, such as complex numbers, poly- 
nomials, and modular arithmetic; these systems lead naturally to their common 
generalization, commutative rings. 

Notation. The set of all rational numbers is denoted by Q, and the set of all 
real numbers is denoted by M. 

We begin by stating some basic properties of real numbers (of course, inte- 
gers and rationals are special cases). These properties undergird a great deal of 
high school algebra; they are essential for the rest of this book and, indeed, for 
abstract algebra. 

Addition and multiplication are functions R x E — > R, namely, (a, b) i-f 
a + h and (a, b) m- ah. The Laws of Substitution say that if a, a' , b, V are 


s(a,b) = 
t(a,b ) = 


a = 0 
a > 0 
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real numbers with a = a' and b = b ' , then 

a + b = a' + b' and ab = a'b' . 

The Laws of Substitution are used extensively (usually tacitly) when solving 
equations or transforming expressions, and they merely say that addition and 
multiplication are single- valued. For example, since — 5 + 5 = 0, we have 


(-5+5) x (-1) = Ox (-1) = 0. 


Here are the properties of addition we are emphasizing. 

Addition : For all real numbers a, b, and c, 

(i) Commutativity, a + b = b + a, 

(ii) 0 + a = a, 

(iii) there is a number —a, called the negative of a (or its additive inverse ), 
with — a + a = 0, 

(iv) Associativity, a + (b + c) = (a + b) + c. 

Let’s say a bit more about associativity. Addition is defined as an operation 
performed on two numbers at a time, but it’s often necessary to add three or 
more numbers. Associativity says that, when evaluating, say 2 + 5 + 3, we 
can first add 2 and 5, giving 7 + 3 = 10, or we can first add 5 and 3, giving 
2 + 8 = 10. In other words, we don’t need parentheses: writing 2 + 5 + 3 
is unambiguous because (2 + 5) + 3 = 2 + (5 + 3). This is not the case 
with subtraction. What is 8 — 3 — 2? If we first subtract 8 — 3, then the answer 
is 5 — 2 = 3. However, if we evaluate 8 — (3 — 2) = 8 — 1, we obtain a 
different answer. Thus, subtraction R 2 — > R, defined by (a , b) i->- a — b, is not 
associative, and we do need parentheses for it. 

Here are the properties of multiplication that we are emphasizing; note that 
they are, formally, the same as those for addition: just replace “plus” by “times” 
(we usually denote the product of numbers a and b by ab, although we will 
occasionally write a ■ b or a x b). 

Multiplication : For all real numbers a,b, and c, 

(i) Commutativity, ab = ba, 

(ii) 1 ■ a = a, 

(iii) If a ^ 0, there is a number a” 1 , called its (multiplicative) inverse (or its 
reciprocal ) with a ■ a 1 = 1, 

(iv) Associativity, a (be) = ( ab)c . 

Finally, we highlight a property involving both addition and multiplication. 
Distributivity. a(b + c) = ac + ab. 

Reading from left to right, distributivity says that we can “multiply a through;” 
reading from right to left, distributivity says that we can “factor a out.” 

Aside from the two Laws of Substitution, one for addition and one for mul- 
tiplication, we have now listed nine properties of addition and multiplication. 

Subtraction and division are defined as follows. 


Given associativity for 
the sum or product of 
3 numbers, generalized 
associativity is also true: 
we don’t need parentheses 
for the sum or product of 
n > 3 numbers. A proof is 
in Appendix A. 5. 


Why do we assume that 
a =£ 0? Read on. 




38 Chapter 1 Early Number Theory 


Definition. If a and b are numbers, define subtraction by 


b — a = b + (—a), 

where —a is the negative of a; that is, —a is the number which, when added 
to a, gives 0. 

Quotient (or division) is defined similarly. 

Definition. If a and b are numbers with h f (), then the quotient of a by b 
is ab~ l , where b~ l is the number which, when multiplied by b, gives 1. We 
often denote ab ” 1 by a /b. 

The word quotient is used here in a different way than in the Division Algo- 
rithm, where it is \ b/a \ , the integer part oib/a (see Exercise 1 .40 on page 29). 


How to Think About It. Almost all the properties just listed for the set R 
of real numbers also hold for the set Z of integers — these properties are “in- 
herited” from R because integers are real numbers. The only property that Z 
doesn’t inherit is the existence of multiplicative inverses. While every nonzero 
integer does have an inverse in R, it may not be an integer; in fact, the only 
nonzero integers whose inverses also lie in Z are 1 and —1. There are other 
familiar algebraic systems that are more like Z than R in the sense that multi- 
plicative inverses may not exist in the system. For example, all polynomials in 
one variable with rational coefficients form such a system, but the multiplica- 
tive inverse \/x of x is not a polynomial. 


Multiplication by 5 is a 
bijection, and we are 
saying that division by 5 
is its inverse function. 
See Example A. 10 in 
Appendix A. 1. 


Other familiar “rules” of arithmetic are easy consequences of these fundamen- 
tal ones. Here are some of them. 

Proposition 1.31. For every number a, we have 0 x a = 0. 

Proof. By Addition Rule (ii), we have 0 + 0=0. Therefore, 

0 x a = (0 + Q)a = (0 x a) + (0 x a). 

Now subtract 0 x a from both sides to obtain 0 = 0 x a. ■ 

What is the meaning of division by 5? When we say that 20 + 5 = 4, we 
mean that 20/5 is a number (namely 4), and that (20/5) x 5 = 20. Dividing 
is the “opposite” of multiplying: dividing by 5 undoes multiplying by 5. This 
agrees with our formal definition. The inverse of 5 is 5 -1 , and 20 • 5~ 1 = 
20/5 = 4. 

Can we divide by zero? If so, then 1/0 would be a number with Ox (1/0) = 
1. But we have just seen that 0 x a = 0 for any number a. In particular, 
0 x (1/0) = 0, giving the contradiction 1 = 0. It follows that 1/0 = 0~ x is 
not a number; we cannot divide by 0. 

Here is another familiar consequence of the nine fundamental properties. 

Proposition 1.32. For a number a, we have 


(—a) ■ (— 1 ) = a. 
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In particular, 


(- 1 ) ■(-!) = 1 . 


Proof. The distributive law gives 


0 = 0- (-a) = (-1 + l)(-a) = (-1) • (-a) + (-a). 


The Law of Substitution 
allows us to replace 0 by 
-1 + 1 . 


Now, add a to both sides to get a = (— 1)(— a). ■ 


How to Think About It. 

Even though its proof is very simple, Proposition 1 .32 is often presented to 
high school students as something mysterious and almost magical. We can only 
guess at a reason. From Euclid’s time until the 1500s, numbers were always 
positive; either negative numbers were not recognized at all or, if they did 
appear, they were regarded with suspicion, as not being bona fide (the complex 
numbers, which came on the scene around the same time, were also suspected 
of witchcraft). In the proof of Proposition 1.32, we treated negative numbers 
without prejudice, and we assumed that they obey the same elementary rules 
as positive numbers do. And we have reaped a reward for clear thinking. 


Addition Rule (iii) states that every real number has a negative, an additive 
inverse. Can a number a have more than one negative? Intuition tells us no, 
and this can be proved using the nine fundamental properties. 

Proposition 1.33. Negatives in R are unique ; that is, for a e R, there is 
exactly one number b in M with b + a = 0. 

Multiplicative inverses of nonzero real numbers are unique', that is, for 
nonzero c e R, there is exactly one real number d with cd = 1. 

Proof Suppose b is a number with a + b = 0. Add —a to both sides: 

—a + (a + b ) = —a . 

We can now use associativity to calculate, like this. 

—a + {a + b) = —a 
(—a + a) + b = —a 
0 + b = —a 
b = —a. 

This argument can be adapted to prove uniqueness of multiplicative in- 
verses; merely replace + by x and “additive inverse’’ by “multiplicative in- 
verse.” ■ 

Uniqueness theorems like Proposition 1.33 are useful because they show 
that certain objects are characterized by their behavior. For example, to show 
that a number b is equal to —a, add b to a and see if you get 0. This is the 
strategy in the next proof. 

Corollary 1.34. For every real number a, we have — a = (— \)a. Similarly, if 
b 0, then (b~ x )~ x = b. 
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Proof. We add (— l)a to a and see if we get 0. 


(— 1 )a + a = (— 1 )a + 1 • a = a(— 1 + 1) = a ■ 0 = 0. 


We do get 0, and so Proposition 1.33 guarantees that —a = (— 1 )a. 

To prove the second statement, interpret the equation bb~ ] = 1 as saying 
that b is an element which, when multiplied by b~ l , gives 1. ■ 

We can now prove the distributive law for subtraction. 

Corollary 1.35. If a, b, c are real numbers, then a(b — c) = ab — ac. 

Proof By definition, b — c = b + (— c). But b — c = b + (— l)c, by Corol- 
lary 1.34. Therefore, distributivity gives 

a(b — c) = a(b + (— l)c) 

= ab + a(— l)c 
= ab + (— l)(ac) 

= ab — ac. ■ 

We have just displayed some properties of addition and multiplication of 
real numbers following from the nine fundamental properties. The proofs fol- 
low only from the nine properties; we did not use any other properties of R, 
such as decimal expansions or inequalities. Hence, if we show, for example, 
that addition and multiplication of complex numbers or of polynomials satisfy 
the nine properties, then each of these systems satisfy the “other properties,” 
Propositions 1.31, 1.32, and 1.33, as well. 


Exercises 

1.68 (i) Prove the additive cancellation law using only the nine properties: if a.b.c 

are real numbers with a + c = b + c, then a = b. 

(ii) Prove the multiplicative cancellation law for real numbers using only the nine 
properties: if a.b.c are real numbers with ac = be and c f 0, then a = b. 

1.69 Suppose that b f 0. Show that a /b is the unique real number whose product with 
b is a. 

1.70 (i) Prove that a real number a is a square if and only if a >0. 

(ii) Prove that every complex number is a square. 

1.71 * Let a.b, c be numbers. 

(i) Prove that —ac, the negative of ac, is equal to (—a)c\ that is, ac + (—a)c = 0. 

(ii) In the proof of Corollary 1.35, we stated that 

ab + a(— l)c = ab — ac. 

Prove this. 

Hint. Evaluate a(0 + 0) in two ways. 

1.72 * Suppose that e and / are integers and let m = min{e, /} and M = ma x{e, /}. 
Show that 


m + M = e + f. 
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1.73 * 

(i) If a is a positive real number such that a n = 1 for an integer n > 1, prove 
that a = 1 . 

(ii) If a is a real number such that a n = 1 for an integer n > 1, prove that 
a = ±1. 

1.74 The Post Office has only 5 and 8 cent stamps today. Which denominations of 
postage can you buy? 

1.75 * Later in this book, we’ll prove Theorem 2.10: every integer can be factored into 
primes in essentially only one way. You may use this theorem here. 

(i) If a e Z, prove that every prime p that divides a 2 shows up with even expo- 
nent; that is, if p | a 2 , then p 2 \ a 2 . 

(ii) Show that there are no integers a and b so that 2 a 2 = b 2 . 

(iii) Use part(ii) to show that there is no rational number .v with x 2 = 2. 

1.76 Use Euclid’s idea of a geometric Division Algorithm (see Figure 1.8 on page 22) 
to give a geometric version of the Euclidean Algorithm that uses repeated geo- 
metric division. Apply your geometric algorithm to 

(i) two segments of length 12 and 90. 

(ii) the diagonal and the side of a square. 

1.5 Connections 

This section applies the method of Diophantus to trigonometry and to calculus. 

Trigonometry 

The formulas x = (1 — f 2 )/(l + t 2 ) and y = 2f/(l + t 2 ), where t is a 
real number, parametrize all the points on the unit circle except (—1, 0). But 
we know that if A = (x, y) is a point on the unit circle, then x = cos 9 and 
y = sin 9, where 9 = ZDO A (see Figure 1.9). 



If 6 = 30°, then (cos 0, sin 0) = (^, j); one coordinate is irrational and 
one is rational. Are there any acute angles 9 with both cos 9 and sin 9 rational? 
If (x, v) is the Pythagorean point arising from (3, 4, 5), then x = cos 9 = | 
and y = sin f) = |. With a little more work, we can prove that there are 
infinitely many angles 9 with both cos 9 and sin 9 rational (is it obvious that 
Pythagorean triples arising from distinct Pythagorean points are not similar?) 
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and also infinitely many angles with both cosine and sine irrational (see Exer- 
cise 1.29 on pagel4). 

The parametrization of the unit circle in Proposition 1 .2, 

„ 1-f 2 „ 2 1 

cos 9 = — and sin# = -, — oo < t < oo, 

1 +r 2 1+t 2 

enables us to prove some trigonometric identities. For example, let’s prove the 
identity 


1 + cos 9 + sin 9 
1 + cos 9 — sin 9 


sec 9 + tan 9. 


First, rewrite everything in terms of sin 9 and cos 9. The left-hand side is fine; 
the right-hand side is (1/ cos#) + (sin#/ cos#). Now replace these by their 
formulas in t . The left-hand side is 


1 + 
1 + 


l-r 2 , 2t 
l +t 2 _ l+r 2 
1-r 2 _ 2t 
l+r 2 l+r 2 


and this simplifies to a rational function of t (that is, a quotient of two poly- 
nomials). Similarly, the right-hand side is also a rational function of t, for 

„ 1 1+f 2 „ 2 1 

sec # = = - and tan # = pr . Thus, verifying whether the 

cos # 1 — t 2 1 — t 2 

trigonometric identity is true is the same thing as verifying whether one ra- 
tional expression is equal to another. This problem involves no ingenuity at 
all. Just cross multiply and check whether the polynomials on either side are 
equal; that is, check whether the monomials on either side having the same 
degree have the same coefficients. 


Integration 

The parametrization of the unit circle is useful for certain integration problems. 
In Figure 1.10, we see that AAOB is isosceles, for two sides are radii; thus, the 


We denote the line de- 
termined by points A 
and B by L(A,B), but 
the notation AB is the 
convention in geometry 
and precalculus books. 



base angles are equal. But the exterior angle # is their sum, and so ZB AO = 
9/2. Therefore, 

t = — = slope L(A, B ) = tan(#/2); 

1 + cos # 
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t = tan((9/2) is called the tangent half-angle formula. Now 

2 dt 


= 2 arctan t and dO = 


2 ' 


1 + t 

Let’s apply this substitution. In most calculus courses, the indefinite integral 
f sec 6 dd = log | sec 6 + tan 6 | is found by some unmotivated trick, but this 
integration is quite natural when we use the method of Diophantus. 


[ sees do = f d!L = [ - 

J J cos 6* J 1 


1 + t 2 2 dt 


1 + f 2 


r 2 dt 

J T ~ 2 


Since 


1 -t 2 


+ 


1-1 - t 1 — t 


we have 


f 2 dt f dt f dt . . 

j TT7I = j TT7 + J = log " + (| - log|1 - 


The hard part is now done; 


log 1 1 +f| log 1 1 t\ = log 


1 + f 


1 -t 

and it is cosmetic to rewrite, using the formula relating t and i 


1 + t (1 + t)~ 1 + 2t + t 2 1 + t 2 


1 -t 


1-t 2 


1-t 2 


1 -f 2 


+ 


2 1 


1-f 2 


sec 9 + tan ( 


Other integrands can also be integrated using the tangent half-angle formula 
(see Exercise 1.78 below). Similar parametrizations of other conic sections also 
lead to integration formulas (see Exercises 1.80-1.82 below and [28, pp. 86- 
97]). 

Exercises 


1.77 Verify the following trigonometric identities. 

„ cos 8 cot 9 
(t) 1 + esc 9 = 


1 — sin 9 


(ii) 


1 


1 


= 2cot0. 


esc 9 — cot 9 cscS+cotd 
(iii) cot 4 9 + cot 2 9 = esc 4 9 — esc 2 9. 


1.78 Integrate the following using the tangent half-angle formula. 
f sin 9 
^ J 2 + cos 9 


■ d9. 


Answer. In l±g , where t = 


. sin 9 — cos 9 . „ 
(ii) / — — -d9. 


I- 

J si 


sin 9 + cos 9 


Answer. In 


l+r 


1 4~2 1 — t 


, which leads to — In | cos 9 + sin 0 \ . 
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1.79 * Preview. 

(i) Sketch the graph of x 2 — xy + v 2 = 1 . 

(ii) Find a “sweeping lines” parametrization for the points on the graph of 
x 2 — xy + y 2 = 1 . 

(iii) Find a scalene triangle with integer side lengths and a 60° angle. 

1.80 Take It Further. 

(i) Find a “sweeping lines” parametrization for the points on the graph of the 
parabola x = y 2 , using lines joining A = (0, 0) to points P = (x, y) on the 
parabola. 

(ii) Use this parametrization to evaluate 

1.81 Take It Further. Show that a “sweeping lines” parametrization for the points 
on the ellipse x 2 /a 2 + y 2 /b 2 = 1. using lines joining A = (—a, 0) to points 
P = (x,y) on the ellipse, is 

a(b 2 — a 2 t 2 ) 2ab 2 t 

X= b 2 + a 2 t 2 ^ b 2 + a 2 t 2 ' 

1.82 Take It Further. Show that a “sweeping lines” parametrization for the points on 
the hyperbola x 2 /a 2 — y 2 /b 2 = 1. using lines joining A = (— a,0) to points 
P = (x, y) on the hyperbola, is 

a(b 2 +a 2 t 2 ) 2 ab 2 t 

x = — -5 and v = -p 5 5 t * 

b 2 -a 2 t 2 ' b 2 — a 2 t 2 

1.83 * Take It Further. Most high school texts derive the quadratic formula by “com- 
pleting the square,” a method we’ll discuss and generalize in Chapter 3. Here’s 
another way to derive the formula. 

(i) Show that if r and are the roots of v 2 + bx + c = 0, then 

r + s = —b and 
rs = c 


/ 


dx 




(ii) If r + s = —b and rs = c, show that 

(r — ^) 2 = b 2 — 4c, 

so that r — s = ± sib 2 — 4c. 

(iii) Solve the system 


r + s = —b 
r — s = z t V b 2 — 4c 


for r and j. 




Induction 



In Chapter 1, we proved some basic theorems of ordinary arithmetic: Division 
Algorithm; Euclidean Algorithm; prime factorization. We are now going to 
prove the Fundamental Theorem of Arithmetic: any two people writing an in- 
teger as a product of primes always get the same factors. We need a very useful 
tool in order to do this, and so we interrupt our historical account to introduce 
mathematical induction, a method of proof that finds application throughout 
mathematics. We’ll go on here to use induction to discuss the Binomial Theo- 
rem and some combinatorics. 


2.1 Induction and Applications 

The term induction has two meanings. The most popular one is inductive rea- 
soning: the process of inferring a general law from the observation of partic- 
ular instances. For example, we say that the Sun will rise tomorrow morning 
because, from the dawn of time, the Sun has risen every morning. Although 
this notion of induction is used frequently in everyday life, it is not adequate 
for mathematical proofs, as we now show. 

Consider the assertion: “/(«) = n 2 — n + 41 is prime for every positive 
integer /?.” Evaluating f (n) for n = 1, 2, 3, . . . , 40 gives the numbers 

41,43,47, 53,61,71,83,97, 113, 131, 

151, 173, 197, 223,251,281,313,347,383,421, 

461, 503, 547, 593, 641, 691, 743, 797, 853, 911, 

971, 1033, 1097, 1163, 1231, 1301, 1373, 1447, 1523, 1601. 

It is tedious, but not very difficult (see Exercise 2.2 on page 52), to show that 
every one of these numbers is prime. Inductive reasoning leads you to expect 
that all numbers of the form f(n) are prime. But the next number, /(41) = 
1681, is not prime, for /(41) = 41 2 — 41 + 41 = 41 2 , which is obviously 
composite. 

An even more spectacular example of the failure of inductive reasoning is 
given by the harmonic series l + i + i + -- -+ i + -- -, which diverges 
(first proved by Oresme (ca. 1320-1 382)), and so its partial sums get arbitrarily 
large. Given a number N , there is a partial sum 


m 1 

= E- 


1 + i + — h ir 


n = 1 


We shall see later that 
many interesting number 
systems do not have 
unique factorization. 
Indeed, not recognizing 
this fact is probably 
responsible for many false 
“proofs” of Fermat’s Last 
Theorem. 
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with X m > N . A high school student, unaware of this, playing with his cal- 
culator and seeing that X 315 ss 6.33137, would probably make the reasonable 
guess that X'” < 100 for all in. But he’s wrong; the series diverges! It is 
known that if m < 1.5 x 10 43 , then S m < 100. The most generous estimate 
of the age of the Earth is ten billion (10,000,000,000) years, or 3.65 x 10 12 
days, a number insignificant when compared to 1.5 x 10 43 . Therefore, starting 
from the Earth’s very first day, if the statement X m < 100 was verified on the 
mth day, then there would be today as much evidence of the general truth of 
these statements as there is that the Sun will rise tomorrow morning. And yet 
most statements X"' < 100 are false! 

Inductive reasoning is valuable in mathematics, as it is in natural science, 
because seeing patterns in data often helps us guess what may be true in general 
(see Exercise 2.1 on page 52, for example). However, merely checking whether 
the first few (or first few trillion) statements are true is not enough. We have 
just seen that checking the first 1.5 x 10 43 statements is inadequate to establish 
a general rule. 

Let’s now discuss mathematical induction. Suppose we are given a se- 
quence of statements 

5(1), 5(2), 5(3), ..., 5(77), .... 

For example, the formula 2" > n for all n > 1 can be viewed as the sequence 
of statements 

2 1 > 1, 2 2 > 2, 2 3 > 3, ..., 2" > 77 , .... 

Mathematical induction is a technique for proving that all the statements are 
true. 

The key idea is just this. Imagine a stairway to the sky. We claim that if 
its bottom step is white and the next step above any white step is also white, 
then all the steps of the stairway are white. Here’s our reasoning. If some steps 
aren’t white, walk up to the first non-white step; call it Fido. Now Fido can’t 
be at the bottom, for the bottom step is white, and so there is a step just below 
Fido. This lower step must be white, because Fido is the first non- white one. 
But Fido, being the next step above a white step, must also be white. This is a 
contradiction; there is no Fido. All the steps are white. 

To sum up, given a list of statements, we are claiming that if 

(i) the first statement is true, and 

(ii) whenever a statement is true, so is the next one, 


The symbol => means 
implies. 


then all the statements on the list are true. 

Let’s apply this idea to the list of inequalities S(n): 2" > n. Now 5(1) is 
true, for 2 1 = 2 > 1. Suppose we believe, for every 7? > 1 , that the implication 
2” _1 >77 — 1 => 2" > 7? is true. Since 5(1) is true and 5(1) => 5(2) is true, 
we have 5(2) true; that is, if 2 1 > 1 and 2 1 > 1 => 2 2 > 2 are both true, 
then 2 2 > 2. Since 2 2 > 2 is true and 2 2 > 2 =^- 2 3 > 3 is true, we have 
2 3 > 3; since 2 3 > 3 is true and 2 3 > 3 =>• 2 4 > 4 is true, we have 2 4 > 4; 
and so forth. Mathematical induction replaces the phrase and so forth with 
statement (ii), which guarantees, for every 77 , that there is never an obstruction 
in the passage from the truth of any statement 5(77 — 1) to the truth of the next 
one 5(77 ). We will prove 2" > n for all 77 > 1 in Proposition 2.2. 

Here is the formal statement of mathematical induction. 
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Theorem 2.1 (Mathematical Induction). Let k be an integer. If S(k), 
S(k + 1), S(k + 2), . . . is a sequence of statements such that 

(i) Base Step-. S(k) is true , and 

(ii) Inductive Step: If for n > k, S(n — 1) being true implies S(n) true , 
then the statements S(n) are true for all n > k. 

We’ll prove this in Theorem 2.17 (you’ll see then that the proof is our story 
about Fido), but let’s use the theorem now to prove some interesting results. 
We start by completing our argument that 2” > n . 

Proposition 2.2. 2" >nforalln > 1. 

Proof. Here k = 1, and the statements are 

5(1) : 2 1 > 1, ..., S(n — 1) : 2" _1 > 7? — 1, S(n):2 n >n, ... 

Base Step: If n = 1, then 2 1 = 2 > 1, so 5(1) is true. 

Inductive Step: We need to show that if n > 1 and S(n — 1) is true, then 5 (/?) 
is true. It is always a good idea to write the statements out so that we can see 
what needs to be proved. Here, we must show that if the inductive hypothesis 

5(77 - 1) : 2" _1 >77-1 

is true, then so is S(n); that is, 2 n ~ 1 >77 — 1 implies 2” > n. Multiply both 
sides of the inequality S(n — 1) by 2: if 2” -1 >77—1, then 

2” = 2 • 2" -1 > 2(77 - 1) = (77 - 1) + (77 - 1) > (77 - 1) + 1 = 77 

(the last inequality holds because 77 > 1 implies 77 — 1 > 1). Thus, if 2" _1 > 
77 — 1 is true, then 2" > n is also true. 

Since both the base step and the inductive step hold, Theorem 2.1 says that 
all the statements are true: 2” > n for all 77 > 1. ■ 


Etymology. The word induction comes from the Latin word meaning to lead 
into or to influence. It is used here because, as we have just seen, the truth of 
the 77th statement arises from the truth of the previous statement. 


Usually the base step in an inductive proof occurs when k = 1, although 
many proofs occur when k = 0 (see Exercise 2.4 on page 52). Here is an 
example of an induction whose base step occurs when k = 5. Consider the 
statements 


5(77) : 2 n > n 2 . 

This is not true for small values of 7? : if 7? =2 or 4, then there is equality, not 
inequality; if n = 3, the left side, 8, is smaller than the right side, 9. However, 
5(5) is true: 32 > 25. 

Proposition 2.3. 2" > n 2 for all integers n >5. 


Many people prefer to 
write the inductive step as 
S{n) =>• S(n + 1) instead 
of S{n — 1) => S(n) as 
we do. The difference is 
cosmetic; the important 
thing is the passage from 
one statement to the next 
one. 


Define S( 0) : 2° > 0. 
Suppose we had taken 
the base step in Propo- 
sition 2.2 at k = 0. Can 
you write out a proof that 
5(0) => 5(1)? 
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Proof. We have just checked the base step S(5). Suppose that n > 5 and that 

2"~ x > (n - l) 2 . (2.1) 

Can we use this to show that 2" > n 2 l Multiply both sides of inequality (2.1) 
by 2 to obtain 


2" > 2 (n - l) 2 . 

We’ll be done if we show, for n > 5, that 2 (n — l) 2 > n 2 . Now 

2(n — l) 2 = (n — l) 2 + (n — 1 )(n — 1) 

> (n — l) 2 + 4(n — 1) since // — 1 >4 

> (n 2 — 2 n + l) + 4(n — 1) 

= n 2 + 2 n — 3. 

But 2 n — 3 is positive, because n > 5, and so n 2 + 2n — 3 > n 2 . ■ 

We now use induction to prove a geometric result. 

Definition. A polygon P in the plane is convex if, for every pair of distinct 
points A, B on its perimeter, the line segment AB lies inside of P . 

For example, every triangle is convex, but there are quadrilaterals that are 
not convex. For example, the shaded quadrilateral in Figure 2.1 is not convex, 
for the line segment joining boundary points A and B is not wholly inside it. 



Figure 2.1. Non-convex polygon. 


Proposition 2.4. Let P be a convex polygon with vertices V\ , , V n . If 6) is 
the (ulterior) angle at Vi, then 


0 1+ ... + 0 n = ( n - 2)180°. 


Proof The proof is by induction on n > 3. For the base step n = 3, the 
polygon is a triangle, and it is well known that the sum of the interior angles 
is 180°. For the inductive step n > 3, let P be a convex polygon with vertices 
!j ..... V„ . Since P is convex, the segment joining V\ and V n ~\ lies wholly 
inside P ; it divides P into the triangle A = AFi V„ V n -\ and the polygon P' 
having vertices Vi,..., V n -i ■ Now P' is convex (why?), so that the inductive 
hypothesis says that the sum of its interior angles is (n — 3)180°. Figure 2.2 
shows that the sum of the interior angles of P is the sum of the angles of A 
and those of P', which is 180° + (n - 3)180° = (n - 2)180°. ■ 
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Vi 



Figure 2.2. Convex polygon. 


In any proof by induction, we must verify both the base step and the induc- 
tive step; verification of only one of them is insufficient. For example, consider 
the statements S(n):n 2 = n. The base step, S(l), is true, but the inductive 
step is false; of course, these statements S{n) are false for all n > 1 . Another 
example is given by the statements S(n):n = n + 1. It is easy to see that the 
inductive step is true: if S{n — 1) is true, i.e., if n — 1 = {n — 1 ) + 1, then adding 
1 to both sides gives n = (n — 1) + 2 = n + 1, which is the next statement, 
S(n). But the base step is false; of course, all these statements S(n) are false. 


How to Think About It. When first seeing induction, many people sus- 
pect that the inductive step is circular reasoning. Why are you allowed to use 
statement S(n — 1), which you don’t know is true, to prove that S(n) is true? 
Isn’t the truth of S(n — 1) essentially what you are supposed to be proving? 
A closer analysis shows that this is not at all what is happening. The inductive 
step, by itself, does not prove that S(n) is true. Rather, it says that if S(n — 1) 
is true, then S{n ) is also true. In other words, the inductive step proves that the 
implication “If S(n — 1) is true, then S(n) is true” is correct. The truth of this 
implication is not the same thing as the truth of its conclusion. For example, 
consider the two statements: “Your grade on every exam is 100%” and “Your 
grade for the course is A.” The implication “If all your exams are perfect, then 
you will get the highest grade for the course” is true. Unfortunately, this does 
not say it is inevitable that your grade for the course will be A. Here is a math- 
ematical example: the implication “If n — 1 = /?, then n = n + 1” is true, but 
the conclusion “» = n + 1” is false. 


From now on, we usually abbreviate mathematical induction to induction. 
Here is the first example of a proof by induction often given in most texts. 

Proposition 2.5. For every integer n > 1, we have 

1 + 2 + f n = jn(n + 1). 

Proof. The proof is by induction on n > 1 . 

Base step. If n = 1, then the left-hand side is 1 and the right-hand side is 
75-1(1 + 1) = 1, as desired. 

Inductive step. The ( n — l)st statement is 

S (n — 1): 1 + 2 + h (n — 1) = j(n — 1 )«, 
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and we must show 

1 + 2 -\ t -n = [l + 2 -\ t - (n - 1)] + n. 

By the inductive hypothesis, the right-hand side is 

j(n — 1 )n + n. 

But j(n — \)n + n = jii(n + 1). By induction, the formula holds for all 
n > 1. ■ 


Historical Note. Here is one version of a popular story. As a 7-year old 
prodigy. Gauss was examined by two mathematicians to evaluate his mathe- 
matical ability. When asked to add up all the numbers from 1 to 100, he thought 
a moment and then said the answer was 5050. Gauss let s denote the sum of 
all the numbers from 1 to 100: s = l + 2 + ■■■ + 99 + 100. Of course, 
s = 100+ 99 + -- - + 2+ 1. Arrange these nicely 

5 = 1 + 2 + • • • + 99 + 100 

= 100 + 99 + • • • + 2 + 1 


and add 


2s = 101 + 101 + •••+ 101 + 101, 

the sum 101 occurring 100 times. We now solve: s = ^(100 x 101) = 5050. 
This argument is valid for any number n in place of 100 (and there is no obvi- 
ous use of induction!). Not only does this give a new proof of Proposition 2.5, 
it shows how the formula could have been discovered. 


Example 2.6. Another proof of the formula in Proposition 2.5 comes from an 
analysis of the square in Figure 2.3. 


We have n = 7 in Fig- 
ure 2.3. 


Figure 2.3. £Z=i k. 

Imagine an (n + 1) x ( n + 1) square. It contains (n + l) 2 small unit squares. 
Since there are n + 1 unit squares on the diagonal, there are 

( n + l) 2 — {n + 1) = n 2 + n 

unit squares off the diagonal. Half of them, j(n 2 + n), are above the diagonal. 
But, if you count by rows, there are 


1+2 + ••• + « 
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unit squares above the diagonal. Hence 

1+2 + ••• + ?! = j(n 2 + n). k 


How to Think About It. Proposition 2.5 illustrates a common problem stu- 
dents have when learning induction. Induction itself is a technique of proof 
(involving just two steps), but it is not a method of discovery. However, the 
two notions of proof and discovery are often intertwined. For example, merely 
applying mathematical induction, as we did in the proof of Proposition 2.5, 
is straightforward. But many beginning students get confused because, at the 
same time as they are following the steps of the proof, they are also wonder- 
ing where the formula for the sum comes from. In contrast, neither Gauss’s 
proof nor the proof using the ( n + 1 ) x (n + 1) square is confusing, for the 
ideas of these proofs and their techniques of proof are separate. In Section 2.3, 
we’ll describe a method for introducing mathematical induction to high school 
students that usually minimizes this confusion. 


Aside from proving statements, induction can also be used to define terms. 
For example, here is an inductive definition of factorial. 

Definition. Define 0! = 1 and, if n > 0, define nl = n ■ (n — 1)!. In other 
words, n ! is defined by 


( 1 if n = 0 

| n ■ (n — 1)! if n > 0. 


Inductive definitions are 
Often called recursive 
definitions. 

Defining 0! = 1 is con- 
venient, as we shall see 
in the next section when 
we discuss the Binomial 
Theorem. 


Induction allows us to define the powers of a number. 


Definition. If a e R, define the powers of a, for n > 0, by induction: 


a 


n 


1 if n = 0 
a n ~ l a if n > 0. 


If a = 0, we have defined 0° = 1 . 


Etymology. The terminology x square and x cube for x 2 and x 3 is, of 
course, geometric in origin. Usage of the term power in this context arises 
from a mistranslation of the Greek dunamis (from which the word dynamo 
comes) as used by Euclid. The standard European rendition of dunamis was 
“power;” for example, the first English translation of Euclid’s Elements by 
H. Billingsley in 1570, renders a sentence of Euclid as “The power of a line is 
the square on the same line” (which doesn’t make much sense to us). However, 
contemporaries of Euclid, e.g., Plato and Aristotle, used dunamis to mean “am- 
plification.” This seems to be a more appropriate translation, for Euclid was 
probably thinking that a one-dimensional line segment can sweep out a two- 
dimensional square. We thank Donna Shalev for informing us of the classical 
usage of dunamis. 
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Proposition 2.7 (Laws of Exponents). Let a e R and m,n > 0 be integers. 

(i) a m+n = a m a n . 

(ii) (a m ) n = a mn . 

Proof, (i) The statement is true for all n > 0 when m = 0. We prove 
u m 1 n = a m a n is true for all n by induction on in > 1. The base step 


says that aa n = a n+1 , which is 

just the definition of powers. For the 

inductive step. 


a m+n = a m+n ~'a 

definition of powers 

= a m ~ 1+n a 


= a m ~ l a n a 

inductive hypothesis 

= a m ~ 1 a n+1 

definition of powers 

_ a m—l+n+l 

inductive hypothesis 

= a m+n . 



(ii) The statement is true for all m > 0 when n = 0. We prove (a m ) n = a mn 
is true for all m by induction on n > 1. The base step says that (a”) 1 = 
a ml = a m , which is obvious. For the inductive step, 

(a m f = (a m )"- 1 a m definition of powers 

_ a m(n-i) a m inductive hypothesis 

= a m{n - l)+m part (i) 


Historical Note. The earliest known occurrence of mathematical induction 
is in Sefer ha-Mispar (also called Maaseh Hoshev, whose Hebrew title means 
practical and theoretical calculating), written by Levi ben Gershon in 1321 
(he is also known as Gersonides or as RaLBaG, the acronym for Rabbi Levi 
ben Gershon). Induction appears later in Arithmeticorum libri duo, written by 
Maurolico in 1557, and also in Traite du Triangle Arithmetique, written by 
Pascal around 1654 (in which Pascal discusses the Binomial Theorem). 


Exercises 

2.1 * Guess a formula for 1 + J -G and use mathematical induction to prove 

that your formula is correct. 

2.2 * Prove that if m > 2 is an integer not divisible by any prime p with p < sfrn, 
then m is a prime. Use this to prove that the numbers n 2 — n + 41 are prime for 
all n < 40. 

2.3 * Let mi, m 2 , ■ ■ ■ ,m n be integers such that gcd (mj,mi) = 1 for all i f j . If 
each;u/ divides an integer k, prove that their product m\m 2 • • -m n also divides k. 
Hint. Use Exercise 1.58 on page 35. 

2.4 * If a is positive, give two proofs that 

, 2 -.j a n - 1 

1 + a + a + • • • + a — , 

a — l 

by induction on n > 0 and by multiplying the left-hand expression by (a — 1). 
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2.5 Let x > — 1 be a real number. Prove that (1 + x) n > 1 + nx for all n > 1. 

2.6 What is the smallest value of k so that 2” > n 3 for all n > kl Why? 

2.7 Assuming the product rule for derivatives, ( fg )' = f'g + fg' , prove that 

(x n ) r = nx " _1 for all integers n > 1. 

2.8 In high school, n ! is usually defined as 1-2-3 n. Show that this agrees with 

the definition on page 5 1 for all n > 1 . 

2.9 (Double Induction) Let k . k' be integers, and let Sim . n ) be a doubly indexed 
family of statements, one for each pair of integers m > k and n > k' . Suppose 
that 

(i) S(k, k') is true, 

(ii) if S(m — 1 ,k') is true, then S(m.k') is true, 

(iii) if S(m,n — 1) is true for all m > k, then S(m,n ) is true for all m > k. 

Prove that S (m , n ) is true for all m > k and n > k' . 

2.10 Prove that (m + 1)" > mn for all m , n > 1. 

2.11 Prove the Laws of Exponents by Double Induction. 

Unique Factorization 

Induction is useful in number theory. As a simple example, we generalize Eu- 
clid’s Lemma to more than two factors. 

Theorem 2.8 (Euclid’s Lemma). If p is a prime and p | O 1 U 2 ■ ■ ■ a n , where 
n > 2, then p \ aj for some i. 

Proof. The proof is by induction on n > 2. The base step is Theorem 1.21. To 
prove the inductive step, suppose that p | «i «2 • ■ -ci n . We may group the fac- 
tors on the right side together so there are only two factors: («i «2 • • • a n ~\)a n . 
By Theorem 1.21, either p | c/i «2 • • • a n -\ or p \ a n . In the first case, the in- 
ductive hypothesis gives p | a, for some i < n — 1 and we are done. In the 
second case, p \a n , and we are also done. ■ 

This proof illustrates an empirical fact. It is not always the case, in an in- 
ductive proof, that the base step is very simple. In fact, all possibilities can 
occur: both steps can be easy, both can be difficult, or one can be harder than 
the other. 

Here is an amusing inductive proof (due to Peter Braunfeld) of the existence 
of the quotient and remainder in the Division Algorithm. 

Proposition 2.9. If a and b are positive integers , then there are integers q 
and r with b = qa + r and 0 < r < a. 

Proof. We do induction on b > 1 . 

The base step: b = 1. Now a > 1, because it is a positive integer. If a = 1, 
choose q = 1 and r = 0; if a > 1, choose q = 0 and r = I . 

Let’s prove the inductive step. The inductive hypothesis is b — 1 = qa + r, 
where 0 < r < a . It follows that b = qa + r + l. Now r < a implies r + 1 < a. 
If r + 1 < a, we are done. If r + 1 = a, then b = qa + (r + 1 ) = qa + a = 
(q + 1 )a, and we are done in this case as well. ■ 


“The proof is by induction 
on n > 2” not only 
indicates the base step, it 
also tells which variable 
will be changing in the 
inductive step. 
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We often use the word 
product even when there 
is only one factor. Thus, 
a prime is a product of 
primes. 


We now use induction to prove unique factorization into primes. 

Theorem 2.10 (Fundamental Theorem of Arithmetic). Every integer a > 2 
is a product of primes. Moreover, if 

a = pi •• • Pm and a=q\---q n , 

where the p ’s and q ’s are primes , then n = m and the q ’s can be re-indexed 
so that qi = pi for all i. 

Proof. The existence of a factorization is Theorem 1.14. To prove uniqueness, 
we may assume that m > n, and we use induction on in > I . The base step is 
obvious: if m = 1, then n = 1 and the given equations are a = p\ = q i ■ For 
the inductive step, the equation 

Pi--- Pm =qi--- dn 

gives p m | q\ ■■■q n . By Euclid’s Lemma, there is some i with p m \ cp. But 
qi, being a prime, has no positive divisors other than 1 and itself, so that 
qi = p m . Re-indexing, we may assume that q n = p m . Canceling, we have 
pi • • • p m - 1 = qi ■ ■ ■ q n -\ . By the inductive hypothesis, n — 1 = m — 1 (so 
that n = m) and the q ' s may be re-indexed so that q\ = />,■ for all i < m. ■ 

Corollary 2.11. If a > 2 is an integer, then there are distinct primes pt and 
integers e,- > 0 with 


a = p e f ■■■ p e n n . 

Moreover, if there are distinct primes qj and integers fj > 0 with 

n ei ■ • ■ n e " = a 
Pi Pn 1 1 dm ' 

then m = n, qi = pi and fj = e ; - for all i (after re-indexing the q's). 

Proof. Just collect like terms in a prime factorization. ■ 

The Fundamental Theorem of Arithmetic says that the exponents ei, ... ,e n 
in the prime factorization a = p\ l ■ ■ ■ pn n are well-defined integers determined 
by a . It would not make sense to speak of the exponent of q dividing a if the 
Fundamental Theorem were false; if an integer a had two factorizations, say, 
a = p 2 q 5 r 6 and a = p 2 q 3 s 7 , where p, q, r, s are distinct primes, would its 
< 7 -exponent be 5 or 3? 

It is often convenient to allow factorizations p\ l ■ ■ ■ p„ ' having some ex- 
ponents ei = 0, because this allows us to use the same set of primes when 
factoring two numbers. For example, 168 = 2 3 3 1 7 1 and 60 = 2 2 3 1 5 1 may be 
rewritten as 168 = 2 3 3 1 5°7 1 and 60 = 2 2 3 1 5 1 7°. 

Lemma 2.12. Let positive integers a and b have prime factorizations 

a = PT ■■■ Pn n and b = p{ 1 ■ ■ ■ p{f , 

where pi, . . . , p n are distinct primes and et, f > 0 for all i. Then a \ b if and 
only ifei < f for all i. 
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Proof. If €i < fj for all i, then b = ac, where c = p{ l ~ ei ■ ■ ■ pl n ~ en ■ Now c 
is an integer, because f — e,- >0 for all i, and so a | b. 

Conversely, if b = ac, let the prime factorization of c be c = pf 1 • • • p%" , 
where gj > 0 for all i . It follows from the Fundamental Theorem of Arithmetic 
that e ; - + gi = f for all i , and so f — e ; - = gi > 0 for all i ; that is, e ; - < f 
for all i . ■ 

Definition. A common multiple of integers a and b is an integer m with a \ m 
and h in. The least common multiple , denoted by 

lcm(a, b), 

is the smallest positive common multiple if both a, b ^ 0, and it is 0 otherwise. 

The following proposition describes gcd’s in terms of prime factorizations. 
This is, in fact, the method usually taught to students in elementary school for 
putting fractions into lowest terms. 

Proposition 2.13. Let a = p e f ■ ■ ■ p e n n and b = p{ x ■ ■ • pl n , where p \ , . . . , p n 
are distinct primes and e; > 0, f > 0 for all i. Then 

gcd(a, b) = p ■ ■ ■ p™ n and lcm(a, b) = p^ x ■ ■ ■ p^ n , 

where mj = min{e/, fj}, and Mj = max{e ; -, f }. 

Proof Define d = p ••• p™ n . Lemma 2.12 shows that d is a (positive) 
common divisor of a and b; moreover, if c is any (positive) common divisor, 
then c = pf x • • • p„ n , where 0 < gj < min{e;, fi} = m, for all i . Therefore, 
c | d. 

A similar argument shows that D = p^ x • • • p^ n is a common multiple 
that divides every other such. ■ 

Computing the gcd for small numbers a and b using their prime factoriza- 
tions is more efficient than using Euclidean Algorithm I. For example, since 
168 = 2 3 3 1 5°7 1 and 60 = 2 2 3 1 5 1 7°, we have gcd(168, 60) = 2 2 3 x 5°7 0 = 
12 and lcm(168, 60) = 2 3 3 1 5 1 7 1 = 840. However, finding the prime factor- 
ization of a large integer is very inefficient, even with today’s fanciest com- 
puters; it is so inefficient that this empirical fact is one of the main ingredients 
in public key cryptography, the basic reason you can safely submit your credit 
card number when buying something online. 

Corollary 2.14. If a and b are positive integers, then 
lcm(a, b) gcd(a, b) = ab. 

Proof. The result follows from Proposition 2. 13 and Exercise 1.72 on page 40: 

mi + Mj = ej + f, 

where m\ = min{e,, f} and Mj = max{e,, f}. ■ 

Since gcd’s can be computed by Euclidean Algorithm I, this corollary al- 
lows us to compute lcm’s: 


Notice how a computa- 
tional inquiry has given a 
theorem. 


lcm(a, b) = ab/ gcd(a, b). 
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Example 2.15. Sudoku is a popular puzzle. One starts with a 9 x 9 grid of 
cells, some filled with numbers. The object is to insert numbers in the blank 
cells so that every row, every column, and every heavily bordered 3x3 box 
contains the digits 1 through 9 exactly once. 
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Figure 2.4. Sudoku. 


KenKen® is a registered 
trademark of Nextoy, LLC, 
2012, KenKen Puzzle LLC. 
All rights reserved. 


KenKen is a variation of Sudoku. As in Sudoku, the object is to fill an n x n 
grid with digits: 1 through 4 for a 4 x 4 grid, 1 through 5 for a 5 x 5 grid, etc., 
so that no digit appears more than once in any row or column. That the cells 
in Sudoku are filled with 1,2, . . . , 9 is not important; one could just as well 
use the first nine letters a, b, . . . , i instead. In contrast, KenKen uses arithmetic. 
KenKen grids are divided into heavily bordered groups of cells, called cages , 
and the numbers in the cells in each cage must produce a target number when 
combined using a specified mathematical operation — either addition, subtrac- 
tion, multiplication or division. Here is a 5 x 5 KenKen puzzle and its solution. 
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Figure 2.5. KenKen® puzzle. Figure 2.6. KenKen® solution. 

The difficulty in solving a KenKen puzzle arises from there being too many 
ways to fill in each cage. Sometimes, the Fundamental Theorem of Arithmetic 
can help. Let’s start solving the puzzle in Figure 2.5. We view the grid as a 
5x5 matrix, and we’ll abbreviate “target-operation” to T-O. Consider the L- 
shaped cage consisting of 4 cells whose target operation is 60x . There are two 
possibilities: its cells are filled with an arrangement either of 2, 2, 3, 5 or of 
1, 3, 4, 5. Assume the first possibility holds. Since we cannot have both 2s in 
the same row or the same column, one 2 is in position (4,3); the other 2 is 
either in position (5, 1) or (5, 2). Suppose 2 sits in the (5, 1) position. There is 
a cage in the first column with T-0 4—; its cells must contain 1 and 5. Hence, 
the other cage, with T-0 2-P, must contain 3 and 5; it cannot. Thus, 2 sits in 
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position (5, 2). There is a cage in the second column with T-0 4—, and its cells 
must contain 1 and 5. This says that the top two cells in the second column 
contain 3 and 4. But the L-shaped cage with T-O 12x must now have 1 in 
position (1,3). This forces the column one cage, with T-0 4—, to have 5 in 
position (1, 1), because it can’t be 1. Thus, the last cage in the first row cannot 
involve 1 or 5. But the only ways to fill in a 2-cell cage with T-O 3— are with 
1 and 4 or with 2 and 5. Conclusion: The 4-cell cage with T-O 60x must be an 
arrangement of 1, 3, 4, 5. The full solution is given in Figure 2.6. ▲ 

Strong Induction 

Certain situations call for a variant of induction, called Strong Induction (or 
the Second Form of Induction). 

Definition. Given integers k and n > k, the predecessors of n are the integers 
I with k < l < n, namely, k,k + 1 n — 1 (k has no predecessor). 

Theorem 2.16 (Strong Induction). Let k be an integer. If S{k), S{k + 1), 
S(k + 2), . . . is a sequence of statements such that 

(i) Base Step: S(k ) is true, and 

(ii) Inductive Step : If, for n > k, S(i) being true for all predecessors l ofn 
implies S(n) true, 

then the statements S{n ) are true for all n > k. 


How to Think About It. Let’s compare the two forms of induction. Both 
start by verifying the base step, and both have an inductive step to prove 
S(n). The inductive hypothesis in the first form is that S(n — 1) is true; the 
inductive hypothesis in Strong Induction is that all the preceding statements 

S(k) S(n — 1) are true. Thus, Strong Induction has a stronger inductive 

hypothesis (actually, each of Theorems 2.1 and 2.16 implies the other). 


We are going to prove Theorem 2.16 and Theorem 2.1 simultaneously (we 
haven’t yet proved the latter theorem). But first we need an easy technical 
remark. The Least Integer Axiom says that every nonempty subset C of the 
natural numbers N contains a smallest number; that is, there is some cq e C 
with Co < c for a 1 1 c e C . This axiom holds, not only for N, but for any subset 

N/t = {n € Z : n > k} 

as well, where k is a fixed, possibly negative, integer. If k > 0, then N/t C N, 
and there is nothing to prove; if k < 0, then we argue as follows. Let C C 
N k be a nonempty subset. If C contains no negative integers, then C C N, 
and the Least Integer Axiom applies; otherwise, keep asking, in turn, whether 
k,k + 1, . . . , — 1 are in C, and define cq to be the first one that lies in C . 

We have alreaady seen the basic idea of the next proof, involving Fido, on 
page 46. Here are the two statements again. 

Theorem 2.17 (= Theorems 2.1 and 2.16). Let k be an integer. If S(k), 
S(k + 1), S{k + 2), . . . is a sequence of statements such that 
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Base Step: S(k) is true, and 
Inductive Step: either 

(i) ifforn > k, S(n — 1) being true implies S(n) true, 
or 

(ii) if, for n > k, S(l) being true for all predecessors l of n implies S(n ) 
true, 

then the statements S(n) are true for all n > k. 

Proof Let Nk = {n € Z : n > k}. We show that there are no integers neNt 
for which S(n) is false. Otherwise, the subset C C N&, consisting of all n for 
which S(n ) is false, is nonempty, and so C has a smallest element, say, Co- The 
truth of the base step says that k < cq, so that cq — 1 lies in N/t and hence there 
is a statement S(cq — 1). 

Case 1. As S(cq) is the first false statement, S(co — 1) must be true. Assuming 
inductive step (i), S(cq) = S ((co — 1) + 1) is true, and this is a contradiction. 

Case 2. As S(cq) is the first false statement, all the statements S(l), where I is 
a predecessor of Co, are true. Assuming inductive step (ii), the strong version, 
we again reach the contradiction that S(co) is true. 

In either case, C = 0 (i.e., C is empty), which says that every S(n) is 
true. ■ 

Here’s a second proof that prime factorizations exist. 

Proposition 2.18 (= Proposition 1.14). Every integer n > 2 is a product of 
primes. 

Proof. The base step S( 2) is true because 2 is a prime. We prove the inductive 
step. If n > 2 is a prime, we are done. Otherwise, n = ab, where 2 < a < n 
and 2 < b < n. As a and b are predecessors of n, each of them is a product of 
primes: 

a = pp' • • • and b = qq' 

Hence, n = pp' ■ ■ ■ qq' ■■ ■ is a product of (at least two) primes. ■ 

The reason why strong induction is more convenient here is that it is more 
natural to use S{a) and S(b) than to use S(n — 1); indeed, it is not at all clear 
how to use S(n — 1). 

The next result says that we can always factor out a largest power of 2 from 
any integer. Of course, this follows easily from the Fundamental Theorem of 
Arithmetic, but we prove the proposition to illustrate further situations in which 
strong induction is more appropriate than the first form. 

Proposition 2.19. Every integer n > I has a unique factorization n = 2 k m, 
where k > 0 and m > 1 is odd. 

Proof. We use strong induction on n > 1 to prove the existence of k and m. If 
n = 1 , take k = 0 and m = 1 . For the inductive step n > 1, we distinguish 
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two cases. If n is odd, take k = 0 and m = n. If n is even, then n = 2b. 
Since b < n, it is a predecessor of n, and so the inductive hypothesis allows us 
to assume b = 2 ( 'm, where l > 0 and m is odd. The desired factorization is 
n = 2b = 2 i+1 m. 

To prove uniqueness (induction is not needed here), suppose that 2 k m = 
n = 2 t m ' , where both k and t are nonnegative and both m and in' are odd. 
We may assume that k > t. If k > t, then canceling 2‘ from both sides gives 
2 k ~‘m = m' . Since k — t > 0, the left side is even while the right side is odd; 
this contradiction shows that k = t. We may thus cancel 2 k from both sides, 
leaving m = m' . ■ 

Exercises 

2.12 (i) Prove that an integer a > 2 is a perfect square if and only if whenever p is 

prime and p \ a, then p 2 \ a. 

(ii) Prove that if an integer z > 2 is a perfect square and d 4 | z 2 , then d 2 | z. 

2.13 Let a and b be relatively prime positive integers. If ah is a perfect square, prove 
that both a and b are perfect squares. 

2.14 * Let a,b,c,n be positive integers with ab = c”. Prove that if a and b are 
relatively prime, then both a and b are ;;th powers; that is, there are positive 
integers k and I with a = k n and b = l n . 

2.15 * For any prime p and any positive integer denote the highest power of p 
dividing n by O p (n). That is, 

O p {n) = e, 

where p e \ n but p e+l \ n.li m and;; are positive integers, prove that 

(i) O p (mn) = O p (m ) + O p (n) 

(ii) O p (m + n) > min {O p (m), O p (n)}. When does equality occur? 

There is a generalization of Exercise 1.6 on page 6. Using a (tricky) inductive 
proof (see FCAA [26], p. 11), we can prove the Inequality of the Means : if n > 2 
and a i a n are positive numbers, then 

V«1 ■■■a„ < H ba n ). 

2.16 (i) Using the Inequality of the Means for n = 3, prove, for all triangles having a 

given perimeter, that the equilateral triangle has the largest area. 

Hint. Use Heron ’s Formula for the area A of a triangle with sides of lengths 
a.b.c. if the semiperimeter is s = j(a + b + c), then 

A 2 = s(s — a)(s — b)(s — c). 

(ii) What conditions on a , b, and c ensure that Heron’s formula produces 0? In- 
terpret geometrically. 

2.17 Let a, b, and c be positive numbers with a > b > c, and let L = ^-(a + b + c). 

(i) Show that either a > b > L > c or a > L > b > c. 

(ii) Assume that a > b > L > c. Show that 

L 3 — (L — b) 2 c — (L — b)(L — c)c — (L — c) 2 L = abc. 

(iii) Use part (ii) to prove the Inequality of the Means for three variables. 

(iv) Show that a box of dimensions a x b x c can be cut up to fit inside a cube of 
side length L with something left over. 


Why isn’t the first form of 
induction convenient here? 


Corollary 2.11 guarantees 
that O p is well-defined. 
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Defining 0! = 1 allows us 
to write the coefficient of 
x n in Eq. (2.2) as a„/n\ 
for all n > 0. 


Differential Equations 

You may have seen differential equations in other courses. If not, don’t worry; 
the next example is self-contained. 

Definition. A differential equation is an equation involving a function y = 
y(x) and its derivatives; a solution is a function y that satisfies the equation. 

Solving a differential equation generalizes indefinite integration: f fix) dx 
is a solution to the differential equation y' = f . There may be many solutions: 
for example, if y = Fix) is an indefinite integral of f f{x)dx, then so is 
F(x) + c for any constant c. 

Assume that a differential equation has a solution y that is a power series. 
Because factorials occur in the coefficients of Taylor series, let’s write a solu- 
tion in the form 

^2 O Cl Yl yj 

y(x) = ao + a\x + —x +•••-( — -x + • • • . (2.2) 

2! n\ 

We ignore questions of convergence. Of course, some power series diverge, 
but we are doing algebra here! 

Induction arises here because we can often find y by relating its coefficients 
a n - 1 and a n . 


Example 2.20. Consider the differential equation y' = y; that is, we seek a 
function equal to its own derivative (do you know such a function?). Assuming 
that y is a power series, then y has an expression as in Eq. (2.2). Using term- 
by-term differentiation, we see that 

/ ^3 9 n — 1 

y = a i + U 2 X + —x + ■ ■ ■ + — — x + ■ ■ • , 

2! (n — 1)! 


so that 


G n 

(«-D! 




Yl — 1 


dyi — \ 

in - 1)! 


x 


t 


for all n > 1 ; 


that is, a i = ao, «2 = at, and, in fact, a n = a n - \ for all n > 1. Rewrite the 
equations: there is no restriction on qq and, for small n, we see that a n = gq. 
If this were true for all n, then 


yix) = a 0 (l + x + \x 2 1- 2fX n -I ) = aoe x . 


It is true that a n = Go for all n\ one proof is by induction (see Exercise 2.18 
on page 62). k 


Differential equations often arise with initial conditions : values of y(0), 
_y'(0), y"(0), ... are specified. 

If y is given by a power series ^(a„/«!)x”, then y(0) = a o- Thus, the ini- 
tial condition v (0) = 1 chooses the solution y = e x in the preceding example. 

The next example shows how strong induction can be used in solving dif- 
ferential equations. 
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Example 2.21. Consider the differential equation 

y" = y' + iy 


(2.3) 


with initial conditions 


y(0) = 2 and /(O) = 1. 

Again, let’s see if there is a power series solution 

, , <*2 2 

y(x) = a o + a\x + —x -| . 

Substituting y into Eq. (2.3) and equating like powers of x gives 

Q n — Cln— 1 + 2r7 n _2- 


Tabulating a n for a few values shows a pattern. All the outputs a n seem to be 
1 away from a power of 2, either 1 more or 1 less. The first two entries record 
the initial conditions. 


n 

&n 

0 

2 

1 

1 

2 

5 = 1 +2-2 

3 

7 = 5 + 2- 1 

4 

17 = 7 + 2-5 

5 

31 = 17 + 2-7 

6 

65 

7 

127 

8 

257 

9 

511 

10 

1025 


Looking closer, the coefficients seem to satisfy a„ = 2" + (— 1 )" . Inductive 
reasoning suggests the conjecture, and mathematical induction is a natural way 
to prove it. But there is a problem. The inductive step for a n involves not only 
a„- 1 , but a „- 2 as well. Strong Induction to the rescue! Before dealing with the 
details, we show that if the formula can be proved to hold for all a n , then we 
can complete our discussion of the differential equation. A solution is 


, , V- 2 " + (- 1 )" « 
y(x) = 2^ : x 


n ! 


= E-7< 2 *)" + £-)(-*>" 


n\ 

= e 2x + e~ 


You can check that e 2x + e x works by substituting it into Eq. (2.3); we have 
solved the differential equation. ▲ 
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The proof of the equation relating the coefficients is by Strong Induction. 


Proposition 2.22. Suppose, for all n > 0, that a n satisfies 


Ctn — 


I? 


n = 0 
n — 1 


[a n -i+2a n - 2 n > 1. 


Then a n = 2 n + (— 1 ) n for all integers n > 0. 


Proof. Because the definition has two initial values, we need to check two base 
steps: 

ao = 2 = 2° + (—1)° and a\ = 1 = 2 1 + (— l) 1 . 

If 77 > 1 and ak = 2 k + (— \.) k for all the predecessors of n, 0 < k < n, then 
Ctrl = Cln— l + 2a n —2 

= (2" _1 + (-1)"- 1 ) + 2 (2 n ~ 2 + (-l) n ~ 2 ) 

= ( 2 n - 1 + (- 1 )"- 1 ) + (2 • 2"~ 2 + 2 • (- 1 )"" 2 ) 

= (2 n - 1 + (-If- 1 ) + (2" _1 + 2 • (-1)"~ 2 ) 

= (2 n - 1 + 2 n ~ 1 ) + ((-l)"- 1 + 2 • (-1)"- 2 ) 

= 2-2"- 1 +(-l f~ 2 (-1+2) 

= 2 n + (-lf~ 2 

= 2 " + (-!)". ■ 


Exercises 

2.18 * Complete the discussion in Example 2.20: show that if 

. . CIO 9 dfl yj 

y (.v ) — no + aix + — — x + • • • + — -x + • • • 

2 ! n ! 

and y' = y, then a n = ao. 

2.19 Assume that “term-by-term” differentiation holds for power series: if /(x) = 
co + cix + C 2 X 2 + • • • + c n x n + • • • , then the power series for the derivative 
/'(x) is 

f'(x) = ci + 2c-2X + 3c3X 2 + • • • + riCnx"- 1 + • • • . 


Here is an instance in 
which it is convenient to 
write 0! = 1. 


(i) Prove that / (0) = c o . 

(ii) Prove, for all n > 0, that if f ln ^ is the 77th derivative of /, then 

/ (n) (x) = n\c n + (n + l)!c„ +1 x + x 2 g„(x), 
where g n (x) is a power series. 

(iii) Prove that c n = f^ n \x)(0)/n \ for all 77 > 0. (This is Taylor’s formula.) 

This exercise shows why, in Example 2.21, that power series were denoted by 
ao + a ix + (fl2/2!)x 2 + (fl3/3!)x 3 + • • • . 
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2.20 Find the solution to the differential equation 

2 y" - y' - 3 y = 0. 


subject to the initial conditions y(0) = y (1) = 1. 
Answer, y = le~ x + 


2.2 Binomial Theorem 

We now look at a result, important enough to deserve its own section, which 
involves both mathematical induction and inductive reasoning. What is the pat- 
tern of the coefficients in the formulas for the powers (1 + x) n of the binomial 
1 + x? Let 


(1 + X) n — Cq + C\X + C2-V 2 + • • • + C n X H . 


Definition. The coefficients c r are called binomial coefficients : 



is the coefficient c r of x r in (1 + x) n . 


Euler introduced the 
notation (y), and this 
symbol evolved into ("), 
which is generally used 
today: 


The binomial coefficient (") is pronounced “n choose r” because it also 
arises in counting problems, as we shall soon see. Thus, 


(i + x y = 


r=0 



For example, 


(1 

+ 

x)° = 

1 






(1 

+ 

x) 1 = 

1 + 

lx 





(1 

+ 

X) 2 = 

1 + 

2x 

+ 

lx 2 



(1 

+ 

x) 3 = 

1 + 

3x 

+ 

3x 2 

+ 

lx 3 

(1 

+ 

x) 4 = 

1 + 

4x 

+ 

6x 2 

+ 

4x 3 + lx 4 


Etymology. Binomial means a + b; trinomial means a + b+c. But monomial 
usually refers to a summand of a polynomial: either ax e for a polynomial in 
one variable, or ax\ l ■ ■ ■ x L „" for a polynomial in several variables. 


The following figure, called Pascal’s triangle , displays an arrangement of the 
first few coefficients. 
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1 

1 1 

1 2 1 
13 3 1 

1 4 6 4 1 

1 5 10 10 5 1 

1 6 15 20 15 6 1 

1 7 21 35 35 21 7 1 


In Pascal’s triangle, an inside number (i.e., not a 1 on the border) of the 
nth row can be computed by going up to the (n — l)st row and adding the two 
neighboring numbers above it. For example, the inside numbers in row 4 can 
be computed from row 3 as follows: 

13 3 1 

1 4 6 4 1 

(4 = 1 + 3, 6 = 3 + 3, and 4 = 3+1). Let’s prove that this observation always 
holds. 


You can also prove 
Lemma 2.23 by induc- 
tion. See Exercise 2.22 on 
page 67. 


Lemma 2.23. For all integers n > 1 and all r with 0 < r < n, 



1 ifn = 0 or n 

<;:{) + (■;') fro <<■<»• 


r 


Proof. The nth row of Pascal’s triangle is the coefficient list for (1 + x) n . The 
fact that the constant term and the highest degree term have coefficient 1 is 
Exercise 2.21 on page 67. For the inside terms, we claim that the coefficient of 
x r in (1 + x) n is the sum of two neighboring coefficients in (1 + x)" _1 . More 
precisely, we claim that if 

(1 + X) n * =Co + CiX + C 2 X 2 + • • • + C n — i x” . 
and 0 < r < n, then the coefficient of x r in (1 + x) n is c r -\ + c r . We have 
(1 + x) n = (1 + x)(l + x)"- 1 = (1 + x)"- 1 + x(l + x)" _1 
= (Co + • • • + C n -\X n *) + x(c 0 + • • • + C n -\X n 1 ) 

= (Co + • • • + C n —\X n 1 ) + (cox + CiX 2 + • • • + c„_ix") 

= 1 + (Co + Ci)x + (ci + C 2 )x 2 + • • • . 

Thuse) = Cr _ 1 + Cr = e:;) + ("; 1 ). ■ 

Pascal’s triangle was known centuries before Pascal’s birth; Figure 2.7 shows 
a Chinese scroll from the year 1303 depicting it. Pascal’s contribution (around 
1650) is a formula for the binomial coefficients. 


Proposition 2.24 (Pascal). For all n > 0 and all r with 0 < r < n. 



n\ 

r \{n — r ) ! 
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1 



1 

'V*j 







Figure 2.7. Pascal’s triangle, China. 1303 CE. 


Proof. The proof is by induction on n > 0. If n = 0, then 



0 ! 

0 ! 0 ! 


= 1 . 


Pascal probably discovered 
this formula by regarding 
(") in a different way. We’ll 
look at this in a moment. 


For the inductive step, note first that the formula holds when r = 0 and 
r = n: 

n\ n\ 

oj “ “ 0!(n -0)! 

and 


Here is another place 
showing that defining 
0! = 1 is convenient. 



n!0f 
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IfO < r < n, then 


n — 1 1 + In — 1 


(Lemma 2.23) 


(n - 1)! (n - 1)! 

= 1 (inductive hypothesis) 

(r — 1 ) ! (/? — r ) ! r\{n — r— 1)! 

= (» ~ 1)! /I f 

(r l)!(n — r — 1)! \n — r r) 

(n — 1)! / n \ n\ 

(r — !)!(« — r — 1)! \r(n — r)/ r\(n — r)\ 

Theorem 2.25 (Binomial Theorem), (i) For all real numbers x and all in- 
tegers n > 0, 

r = 0 r = 0 v ’ 

(ii) For all real numbers a and b and all integers n > 0, 

<»+'»" = t 

r= 0 \ / r= 0 ' y ■ 

Proof, (i) This follows from replacing (") by Pascal’s formula in Proposi- 
tion 2.24. 

(ii) The result is trivially true when a = 0 (we have agreed that 0° = 1). If 
a ^ 0, set x = b/a in part (i), and observe that 

/ b\ n /a + b\ n (a + b) n 


KJ -m - 


Hence, 


<“ + »r = «■(■ + b ~y = t (;:)^ = ± - 

There are many beautiful connections between Pascal’s triangle and number 
theory. For example, while it is not generally true that n \ (") (for example, 
4 \ 6 = ( 2 )), this result is true when n is prime. 

Proposition 2.26. If p is a prime, then p \ (’’) for all r with 0 < r < p. 


Proof By Pascal’s Theorem, 


pi p{p -1) ■■■(p -r + 1) 


r) r\(p — r)\ 


and cross multiplying gives 


HI I = -r + 1); 
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that is, p | r i(^). But each factor of r! is strictly less than p, because r < p, 
so that p is not a divisor of any of them. Therefore, Euclid’s Lemma says that 
p \ r! and, hence, that p must divide (£). ■ 

Example 2.27. The Binomial Theorem can be used to express the sum of the 
nth powers of two variables a and b in terms of the “elementary symmetric 
functions” a + b and ab. Here are some examples for n = 2, 3, 4; from 

(a + b ) 2 = a 2 + 2 ab + b 2 

we have 

a 2 + b 2 = (a + b ) 2 — lab. 

From 

(a + b ) 3 = a 3 + 3a 2 £> + 3aZ/ 2 + £> 3 

we conclude 

a 3 + & 3 = (a + b ) 3 — 3o/t(o + &). 

For/? = 4, 


(a + £) 4 = a 4 + Aa 3 b + 6 a 2 b 2 + 4a6 3 + b 4 

= ( a 4 + b 4 ) + 4ab(a 2 + b 2 ) — 6 (ab) 2 . 


Hence, 


a 4 + b 4 = (a + b) 4 — 4 ab{a 2 + b 2 ) + 6{ab) 2 

We can now replace a 2 +b 2 by the already computed expression ( a+b) 2 —lab , 
collect like terms, and have an expression for a 4 + b 4 in terms of a + b and 
ab. 

We could proceed inductively, expressing a n + b n in terms of a + b and ab 
for n > 5. Try a few more examples; you’ll get the sense that there’s a general 
method expressing a n + b n in terms of a + b, ab. and other terms like a k + b k 
with k < n. k 

Exercises 

2.21 * Show, without using the Binomial Theorem, that, if n > 0 is an integer, then 

(i) the degree of (1 + x) n is n 

(ii) the leading coefficient of (1 + x) n is 1 

(iii) the constant term of (1 + x) n is 1 . 

2.22 Prove Lemma 2.23 by induction on n > 0. 

2.23 Prove that the binomial coefficients are symmetric, for all r with 0 </'</?, 
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2.24 Find a formula for the sum of the entries in the nth row of Pascal’s triangle and 
prove your assertion. 

2.25 Ifn > 1, find a formula for the alternating sum of the binomial coefficients in the 
nth row of Pascal’s triangle: 



Prove what you say. 

Hint. Consider fix) = (1 + x) n when x = — 1. 

2.26 If n > 1 , find a formula for the sum of the squares of the binomial coefficients in 
the nth row of Pascal’s triangle: 



Prove what you say. 

2.27 Prove, for a given n > 1, that the sum of all the binomial coefficients (”) with r 
even is equal to the sum of all those (”) with r odd. 

2.28 The triangular numbers count the number of squares in a staircase of height n. 
Figure 2.8 displays the staircases of height n for 1 < n < 5. 



Figure 2.8. Triangular numbers. 

(i) Find a formula for the 77 th triangular number in terms of binomial coefficients, 
and prove your assertion. Compare this exercise with the discussion of the 
(n + 1) x (n + 1) square in Example 2.6. 

(ii) Show that the sum of two consecutive triangular numbers is a perfect square. 

2.29 Take It Further. Using the notation of Example 2.27, use the Binomial Theorem 
and induction to show that a n + b n can be expressed in terms of a + b and ab. 

2.30 Pascal’s triangle enjoys a sort of hockey stick property: if you start at the end of 
any row and draw a hockey stick along a diagonal, as in Figure 2.9, the sum of the 
entries on the handle of the stick is the entry at the tip of the blade. Express the 
hockey stick property as an identity involving binomial coefficients and prove the 
identity. 

2.31 (Leibniz) A function / : R — > R is called a C°° -function if it has an nth deriva- 
tive f {n) (x) for every n > 1 . Prove that if / and g are C 00 -functions, then 

(fg) (n Hx) = £ Q/ (fc) to •*<"-*> (x). 

In spite of the strong resemblance, there is no routine derivation of the Leibniz 
formula from the Binomial Theorem (there is a derivation using an idea front 
hypergeometric series). 
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Figure 2.9. Hockey sticks. 


2.32 * If p is a prime and a and b are integers, prove that there is an integer c with 

(a + b) p = a p + b p + pc. 


Combinatorics 

Binomial coefficients have a combinatorial interpretation. Given a set X with 
n elements, define an r-subset of X to be a subset having exactly r elements. 
How many r -subsets of X are there? 

Example 2.28. There are ten 3-element subsets of the 5-element set X = 
{A, B. C, D. E}. Think of forming a 3-person committee from 5 people. A com- 
mittee either contains Elvis or doesn’t. The committees are 

{A,B,C} {A.B.D} { B.C.D } {A,C, D} 

{A.B.E} { A,C,E } {A,D.E} \B.C,E} {B.D.E} {C,D.E} 

The first row consists of the 3-subsets that don’t contain Elvis (there are four 
such); the second row displays the 3-subsets that do contain Elvis (there are 
six of these). ▲ 

In general, if X has n elements and 0 < r < n, denote the number of its 
r-element subsets by 

[n, r]\ 

that is, [n , r] is the number of ways one can choose r things from a box of n 
things. Note that: 

(i) [/?, 0] = 1 (there’s only one 0-subset, the empty set 0). 

(ii) [», n] = 1 (there’s only one «-subset of X, X itself). 

If 0 < r < n, you can compute [ 77 , r] using the committee idea in Exam- 
ple 2.28. If X = {cii,ci 2 , ■ ■ ■ ,a n } and you want to build an /'-subset, first 
choose a “distinguished” element of X, say a n , and call him Elvis. Either your 
subset contains Elvis or it doesn’t. 


When n = 0, items (i) 
and (ii) give the same 
answer. Why does this 
make sense? 
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Case 1. If Elvis is in your /'-subset, then you must pick r — 1 elements from 
the remaining n — 1; by definition, there are [// — I . r — 1] ways to do this. 
Case 2. If Elvis is not in your r -subset, then you must pick all r elements from 
the remaining n — 1; there are [// — 1, r ] ways to do this. 

It follows that [// , r ] = [n — l, r — \\ + [n — 1 , r ] . We have proved the following 
result. 

Lemma 2.29. For all integers n > 1 and all r with 0 < r < n, 

1 ifn = 0 or n = r 

[n — 1, r] + [n — 1, r] if 0 < r < n. 

The similarity between Lemmas 2.23 and 2.29 inspires the next theorem. 
It is also the reason why the binomial coefficient (") is usually pronounced 
“n choose r.” 



Theorem 2.30 (Counting Subsets). Ifn and r are nonnegative integers with 
0 < r < n, then 



Proof. Use induction on n > 0. If n = 0, the inequality 0 < r < n forces 
r = 0, and 



= 1 = [ 0 , 0 ]. 


Suppose the result is true for n — 1. If 0 < r < /j, then 




[n — 1, r — 1] + [n — 1, r] 
[n, r] 


Lemma 2.23 

inductive hypothesis 
Lemma 2.29. ■ 


Theorems can often be proved in several ways. The following discussion 
gives another proof of Theorem 2.30 using Pascal’s formula (Proposition 2.24) 
instead of Lemmas 2.23 and 2.29. 

We first compute [//, r] by considering a related question. 


Definition. Given an “alphabet” with n (distinct) letters and an integer r with 
1 < r < n, an r-anagram is a sequence of r of these letters with no repetitions. 

Lor example, the 2-anagrams on the alphabet a, b, c are 
ab, ba, ac, ca , be, cb 

(note that aa,bb, cc are not on this list). How many /--anagrams are there on an 
alphabet with n letters? We count the number of such anagrams in two ways. 
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(i) There are n choices for the first letter; since no letter is repeated, there are 
only n — 1 choices for the second letter, only n — 2 choices for the third 
letter, and so forth. Thus, the number of r-anagrams is 

n(n — 1)(« — 2) ■■■(« — (r — 1)) = n{n — l)(/7 — 2) •••(// — r + 1). 

In the special case n = r, the number of //-anagrams on n letters is // !. 

(ii) Here is a second way to count the anagrams. First choose an /'-subset of 
the alphabet (consisting of r letters); there are [n, r ] ways to do this, for 
this is exactly what the symbol [//, r] means. For each chosen r-subset, 
there are r ! ways to arrange the r letters in it (this is the special case of 
our first count when // = r ). The number of / -anagrams is thus 

r! [//, r]. 


We conclude that 

rl [//, /•] = //(// — 1 )(« — 2) •••(// — r + 1), 

from which it follows that 

n (// — 1)(« — 2) • • • (n — r + 1) n\ 


[n , r ] 


r\ 


{n —/•)!/'! 


Therefore, Pascal’s formula gives 


[n, r] = 



If you piece together the results of this section, you’ll see that we have 
shown that the following ways to define binomial coefficients are all equiva- 
lent: starting from any one of them, you can derive the others. 

Algebraic: (") is the coefficient of x r in the expansion of (1 + x) n . 

Pascal: 



n\ 

r ! (// — r ) ! 


Combinatorial: (") is the number of /'-element subsets of an /7-element set. 

Inductive: 



<;:!) + 


if n = 0 or n = r 
if 0 < r < n . 


Example 2.31. If you replace the symbols by their definition. Theorem 2.30 
says something that is far from obvious: the coefficient of x r in (1 + x) n is 
the same as the number of r-element subsets of an /7-element set. The proof by 
induction of Theorem 2.30 establishes this, but many people are left wondering 
if there is a more intuitive reason why the expansion of (1 + x) n contains all 
the information about subsets of various sizes from an //-element set. 
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If you were going to multiply out (1 + x) 5 by hand, you could view the 
calculation like this: 

(1 + x) 5 = (1 + x)(l + x)(l + x)(l + x)(l + x) 

The expansion is carried out by taking one term (1 or x) from each binomial 
factor 1 + x, multiplying them together, and then collecting like powers of x. 

For example, you could take a “1” from each of the first three binomials and 
an x from the last two. That would produce 1 • 1 • 1 • x • x = x 2 . But that’s 
not the only way to get an x 2 . You could have taken an x from the first and 
third binomials and 1 from the rest. Or an x from the first two binomials and 1 
from the last three. Do this in every possible way; the coefficient of x 2 in the 
expansion will be the number of ways you can pick two binomials from the set 
of five to be “x terms.” And there are precisely 10 = [5, 2] ways to do this. 
Generalizing, view (1 + x) n as a product of n binomials: 

(1 + x) n = (1 + x)(l + x)(l + x) . . . (1 + x). 

n times. 

The coefficient of x r in this product is the number of ways you can choose r of 
the binomials to be “x terms” (and the rest to be 1). This number is precisely 
[«, r]. Hence 


n 

(1 +x) n = ^[»,r]x r . 

r=0 

When combined with the definition of binomial coefficients on page 63, this 
gives another proof that (") = [n , r], ▲ 


Exercises 

2.33 * 

(i) For each value of r, 0 < r < 4, how many r-element subsets of the set 
{A, B, C, D} are there? 

(ii) For each value of r, 0 < r < 5, how many r-element subsets of the set 
{A, B, C, D, E} are there? 

2.34 How many subsets (of any size) are there in an n -element set? Prove your asser- 
tion. 

2.35 Show that 



Hint. Split a 2n -element set into two equal pieces. 



2.36 Show that 
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2.37 If m, n, and r are nonnegative integers, prove Vandermonde’s Identity : 



Hint. (1 + x) m+n = (1 + x) m (\ + x) n . 


2.38 Show that 



= n 2 


n— 1 


2.39 How many ways can you choose two hats from a closet containing 14 different 
hats? (One of our friends does not like the phrasing of this exercise. After all, 
you can choose two hats with your left hand, with your right hand, with your 
teeth, .... but we continue the evil tradition.) 

2.40 Let D be a collection of ten different dogs, and let C be a collection of ten dif- 
ferent cats. Prove that there are the same number of quartets of dogs as there are 
sextets of cats. 

2.41 (i) What is the coefficient of x 16 in (1 + x) 20 ? 

(ii) How many ways are there to choose 4 colors from a palette containing paints 
of 20 different colors? 


2.42 A weekly lottery asks you to select 5 different numbers between 1 and 45. At the 
week’s end, 5 such numbers are drawn at random, and you win the jackpot if all 
your numbers match the drawn numbers. What is your chance of winning? 

The number of selections of 5 numbers is “45 choose 5”, which is (^ 5 ) = 
1, 221, 759. The odds against your winning are more than a million to one. 


2.3 Connections 


An Approach to Induction 

Teaching mathematical induction to high school students is often tough. In 
particular, many students fall into the trap we described on page 49: in spite 
of all our explanations to the contrary, they think that the inductive hypothesis 
assumes what it is they are supposed to be proving. In this section, we look at 
a well-tested method that avoids this trap. 

Suppose you ask a class to come up with a function that agrees with the 
table 


Input 

Output 

0 

4 

1 

7 

2 

10 

3 

13 

4 

16 

5 

19 


We’ve found that about half a high school class (beginning algebra, say) comes 
up with a closed form definition, something like / ( n ) = 3/7 + 4, while the other 
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Of course, beginning 
students don’t usually 
write the definition of g 
using this case notation. 
Technology can help get 
them used to it. 


The part of g’s definition 
involving n, the equation 
g{n) = g{n - 1) + 3, is 
called a recurrence . 


half comes up with an inductive definition — something like “start with 4, and 
each output is 3 more than the previous one.” This inductive definition can be 
written more formally: 

4 if n = 0 

g{n — 1) + 3 if n > 0. 

That inductive definitions (or recursive definitions , as they are often called) 
seem to be, in some sense, natural for students, can be exploited to help stu- 
dents understand proof by induction. 

Computer algebra systems (CAS) let you model the two definitions. Build- 
ing such computational models allows students to experiment with both func- 
tions, and it also provides an opportunity to launch some important ideas. For 
example, a teacher can use the models to discuss the domain of each function — 
/ accepts any real number, but g will accept only nonnegative integers. 

Here’s what’s most germane to this section. If you try some values in a 
spreadsheet or calculator, it seems, for a while, that / and g produce the same 
output when given the same input. But at some point (exactly where depends 
on the system), / outputs an integer but g surrenders. Suppose this happens, 
for example, at 255: both functions return 766 at 254, but /( 255) = 769 
while g returns an error. Is this because / and g really are not equal at 255? 
Or is it because of the limitations of the technology? Many students will say 
immediately that the functions are equal when n = 255; it’s just that the com- 
puter can’t compute the value of g there. Tell the students, “I believe you that 
/ (254) = g(254). Convince me that they are also equal at 255.” 

After some polishing and a little help, their argument usually goes some- 
thing like this. 

g(255) = g(254) + 3 (this is how g is defined) 

= / (254) + 3 (the calculator said so — they both output 766) 

= (3 • 254 + 4) + 3 (this is how / is defined) 

= (3 • 254 + 3) + 4 (algebra) 

= 3(254 + 1) + 4 (more algebra) 

= 3 • 255 + 4 (arithmetic) 

= / (255) (this is how / is defined). 

There’s nothing special about 255 here. If you had a more powerful calculator, 
one that handled inputs to g up to, say, 567 (and then crashed at 568), you 
could argue that / and g were equal at 568, too. 

g(568) = g(567) + 3 (this is how g is defined) 

= / (567) + 3 (the powerful calculator said so) 

= (3 • 567 + 4) + 3 (this is how / is defined) 

= (3 • 567 + 3) + 4 (algebra) 

= 3(567 + 1) + 4 (more algebra) 

= 3 • 568 + 4 (arithmetic) 

= / (568) (this is how / is defined). 

Computing g(568) is the same as computing g(255). Now imagine that you 
had a virtual calculator, one that showed that / and g agreed up to some integer 
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7? — l, but then crashed when you asked for g{n). You could show that / and 
g are equal at n by the same argument: 

g(n ) = g(n — 1) + 3 (this is how g is defined) 

= f(n — 1) + 3 (the virtual calculator said so) 

= (3 • (77 — 1) + 4) + 3 (this is how / is defined) 

= 377 + 4 (algebra) 

= / ( 77 ) (this is how / is defined). 

So, every time / and g are equal at one integer, they are equal at the next 
one. Since / and g are equal at 0 (in fact, since they are equal at every integer 
between 0 and 254), they are equal at every nonnegative integer. 

This argument is the essence of mathematical induction. In the example, it 
shows that if two functions / and g are equal at one integer, then they are 
equal at the next one. Coupled with the fact that they are equal at 0, it makes 
sense that they are equal for all integers greater than or equal to 0; that is, 
/ (77) = g(n) for all nonnegative integers n. 

We have seen that induction applies in much more general situations than 
this one. But this simple context is quite effective in starting students onto a 
path that helps them understand induction. 


Fibonacci Sequence 

Many interesting investigations in high school center around the following se- 
quence, which describes a pattern frequently found in nature and in art. 


Definition. The Fibonacci sequence is defined by: 


Fn 


0 77 = 0 

1 77 = 1 

F n - 1 + F n —2 77 > 1 . 


There are two base steps in the definition: n = 0 and 77 = 1 . The Fibonacci 
sequence begins: 0, 1, 1, 2, 3, 5, 8, 13, 


Historical Note. The Fibonacci sequence is related to the golden ratio, a 
number mentioned in Euclid, Book 6, Proposition 30. It is said that the ancient 
Greeks thought that a rectangular figure is most pleasing to the eye (such rect- 
angles can be seen in the Parthenon in Athens) if its edges a and b are in the 
proportion 

a b 
b a + b 

In this case, a (a + b) = b 2 ,sothatb 2 —ab—a 2 = 0; that is, (b /a) 2 — (b /ci) — \ 
= 0. The quadratic formula gives b /a = ^(1 ± \f5). Therefore, 

b/a = y = 5(1 + \/ 5 ) or b/a = 8 = ^(1 — V5). 

But 8 is negative, and so we must have 

b/a = y = 1 + >/ 5 ). 


75 
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Fig U re 2 . 1 0 . Golden rectangle . 


The number y = 1.61803 . . is called the golden ratio. Since both y and <5 
are roots of x 2 — x — 1 , we have 

y 2 = y + 1 and 8 2 = 8 + 1. (2.4) 

So, what’s the connection of the golden ratio to the Fibonacci sequence? 
We discovered the closed form for the sequence c n in Proposition 2.22 by 
tabulating the first few terms of the sequence and looking for regularity — it 
seemed “almost” exponential, off by 1 from a power of 2 . 

Let’s tabulate the first few ratios of consecutive terms F n / F n -\ of the Fi- 
bonacci sequence. 

Fi/F 0 = 1/1 = 1 
F 2 /F 1 = 2 / 1=2 
F 3 /F 2 = 3/2 = 1.5 
F 4 /F 3 = 5/3 = 1.666 
F 5 /F 4 = 8/5 = 1.6 
F 6 /F 5 = 13/8 = 1.625. 

If you tabulate a few more ratios (try it), a conjecture emerges — it appears that 
the ratio of two consecutive terms in the Fibonacci sequence might converge to 
the golden ratio y ss 1.61803 (if the ratios were actually constant , F n would 
be a geometric sequence (why?)). This is, in fact, the case, and you’ll see, in 
Exercise 2.50 on page 78, how to refine the conjecture into the statement of the 
following theorem (the exercise will also help you develop a method that will 
let you find closed forms for many 2 -term recurrences). 

Theorem 2.32. For all n > 0, the nth term of the Fibonacci sequence satisfies 

Fn = ^(y n -8 n ), 

where y = i(l + V5) and 8 = i(l — \/5). 

Proof. We use strong induction because the inductive step involves the formu- 
las for both F n -\ and F n - 2 . The base steps S(0) and .S' (I ) are true: 

^ (y° - s°) = 0 = F 0 

7f(y -S) = ^(5(1 + V 5 ) - i(i - V?)) = 1 = Fi. 


If n > 2, then 






2.3 Connections 77 


F n — F n - 1 + F n - 2 

= ^(y”- 1 -r- 1 ) + ^(y' ! - 2 -r- 2 ) 

= ^[(y”- 1 +y"- 2 )-(r- 1 +r- 2 )] 

= ^ [y"- 2 (y + i) - s n ~ 2 (S + l)] 

= -L [y”~ 2 (y 2 ) - 5 n - 2 (5 2 )] by Eq. (2.4) 

= ^(y"-n. ■ 

Isn’t it curious that the integers F n are expressed in terms of the irrational 
number V5? 

Corollary 2.33. F n > y n ~ 2 for all integers n > 3, where y = ^(1 + \/5). 

Proof. The proof is by induction on n > 3. The base step S( 3) is true, for 
F 3 = 2 > y ss 1.618. For the inductive step, we must show that F n+ \ > y n ~ 1 . 
By the inductive hypothesis, 

Fn+I =F n + F n -1 > y n ~ 2 + y ”- 3 

= y n ~ 3 (y + 1 ) = y"~ 3 y 2 = y"^ 1 . ■ 


Exercises 


2.43 Show that the following functions agree on all natural numbers. 

f(n) = 3n + 5 and g(/i) = 

2.44 Show that the following two functions agree on all natural numbers. 


5 if n = 0 

g(n — 1) + 3 if n > 0. 


f(n) = 4" and 
2.45 Define the function h inductively: 


g(n) = 


4 

4 g(n - 1) 


if « =0 
if n > 0. 


h(n) = 



1) + 2 n 


if n = 0 
if /j >0 


Find a polynomial function p that agrees with h on all natural numbers, and prove 
that your functions are equal on N . 

Answer, n 2 + n + 4. 

2.46 Define the function m inductively: 


0 if n = 0 

m(n — 1) + n 2 if n > 0 

Find a polynomial function s that agrees with m on all natural numbers, and prove 
that your functions are equal on N. 

2n 3 + 3n 2 +n _ n(n + l)(2n + l) 

-g 



If n = 2, then TD = 1 = 
y°, and so there is equality, 
not inequality. 


Answer. 
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2.47 Consider the two functions 

fix) 

and 

g(x) = 

Are / and g equal onN? 

2.48 Find a formula for 0 2 + l 2 + 2 2 + ••• + (« — l) 2 as a function of n, and prove 
your assertion. 

2.49 You saw, on page 76, that the ratio of two consecutive terms seems to converge to 
the golden ratio. Using only the recurrence 

F„ = F n - 1 + 2 for all n > 2, 


/ and g: 

= -y 4 — 6x 3 + 14.y 2 — 6x + 2 

(2 if .Y = 0 

1 ,y(.Y — 1) + 6.y — 3 if x > 0. 


show that 


F n 

lim — — 

n ~*° 0 F n -\ 


— ^(1 + 75). 


2.50 * You saw, on page 76, that the Fibonacci sequence seems to be “almost” expo- 
nential. 

(i) Suppose the Fibonacci sequence actually was exponential: F n = r n . Show 
that r would have to be either 

1 + . 1-75 

y = or 0 = . 

2 2 

(ii) Show that the sequences y n and S n satisfy the recurrence 

fn = fn - 1 + fn- 2 - 

(iii) If a and b are any real numbers, show that ay” + bS n satisfies the recurrence 

fn — fn — 1 + fn— 2 - 

(iv) Without using Theorem 2.32, find a and b so that 


ay n +bS n = F n . 


2.51 Ms. D’Amato likes to take a different route to work every day. She will quit her 
job the day she has to repeat her route. Her home and work are pictured in the grid 
of streets in Figure 2.11. If she never backtracks (she only travels north or east), 
how many days will she work at her job? 



Figure 2.1 1 . Ms. D’ Amato. 
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2.52 Find a closed form for each of the following functions and prove your assertions. 

„ (4 n = 0 

(i) /(«) = < 

| f(n — 1) + 3 n > 0. 

Answer. f(n ) = 3 n + 4. 

| 4 n = 0 

I 3/(n — 1) n > 0. 

Answer. /(«) = 4-3”. 


(ii) f(n) = 


1 2 n = 0 

4 « = 1 

4/(n - 1) — 3/(n — 2) 11 > 1. 

Answer, /(n) = 3" + 1. 

1 4 n = 0 

4 « = 1 

4/(n — 1 ) — 3/(n — 2) n > 1 . 


Answer. /(«) = 4. 

2.53 Find a closed form for the following function and prove your assertion. 

1 3 

4 
14 

4/(« - 1) - f(n - 2) - 6/(« 

2.54 Take It Further. Find or generate a copy of the first 30 rows of Pascal’s triangle. 
Color the odd numbers red and the even numbers black. Explain any patterns that 
you see. (Alternatively, you can use a spreadsheet to generate the triangle of Os 
and Is that are the remainders when each entry is divided by 2.) For more on this 
exercise, see 

ecademy . agnes scott . edu/ ~ 1 riddle/if s/siertri/Pascalmath . htm 


n = 0 
n = 1 
n = 2 
-3) n > 2. 





3 

Renaissance 


For centuries, the Western World believed that the high point of civilization 
took place from the Greek and Roman eras through the beginning of Chris- 
tianity. But this worldview began to change dramatically about five hundred 
years ago. The printing press was invented around 1450, by Johannes Guten- 
berg, Christopher Columbus landed in North America in 1492, Martin Luther 
began the Reformation in 1517, and Nicolas Copernicus published De Revolu- 
tionibus in 1530. 

Mathematics was also developing. A formula giving the roots of certain 
cubic polynomials, similar to the quadratic formula, was discovered by Scipi- 
one del Ferro around 1515; by 1545, it was extended to all cubics by Fontana 
(Tartaglia) and Cardano. The cubic formula contributed to the change in world- 
view that was the essence of the Renaissance, for it was one of the first math- 
ematical results not known to the ancients. But its impact on contemporary 
mathematics was much deeper, for it introduced complex numbers. As we shall 
see, the cubic formula is not as useful for numerical computations as we’d like, 
because it often gives roots in unrecognizable form. Its importance, however, 
lies in the ideas it generated. Trying to understand the formula, searching for 
generalizations of it, and studying questions naturally arising from such en- 
deavors, were driving forces in the development of abstract algebra. 

In many high school algebra courses today, the complex numbers, usually 
denoted by C, are introduced to find the roots of ax 2 + bx + c when b 2 — \ac < 
0. That’s not how it happened. Square roots of negative real numbers occur in 
the cubic formula, but not as roots; indeed, in the 16th century, complex roots 
would have been considered useless. But complex numbers arose in the middle 
of calculations, eventually producing real numbers (we will see this explicitly 
in the next section). To understand this phenomenon, mathematicians were 
forced to investigate the meaning of number, are complex numbers bona fide 
numbers? Are negative numbers bona fide numbers? 

Section 3.1 discusses the classical formulas giving the roots of cubic and 
quartic polynomials. We will look more carefully at the complex numbers 
themselves in Section 3.2. Although initially used in purely algebraic contexts, 
C has a rich geometric and analytic structure that, when taken together with its 
algebraic properties, can tie together many of the ideas in high school mathe- 
matics. Indeed, C finds applications all over mathematics. Section 3.4 uses C 
to solve some problems that are especially useful for teachers (and interesting 
for all mathematicians). Just as the method of Diophantus was used to create 
Pythagorean triples, C can be used to invent problems whose solutions “come 
out nice.” 
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3.1 Classical Formulas 

As Europe emerged from the Dark Ages, a major open problem in mathemat- 
ics was finding solutions to polynomial equations. The quadratic formula had 
been known for about four thousand years and, arising from a tradition of pub- 
lic mathematical contests in Pisa and Venice, formulas for the roots of cubics 
and quartics had been found in the early 1500s. Let’s look at these formulas 
in modern algebraic notation; we will assume for now that the complex num- 
bers obey the usual laws of arithmetic (neither of these simplifying steps was 
available to mathematicians of the 16th century). 


Historical Note. Modern arithmetic notation was introduced in the late 
1500s, but it was not generally agreed upon in Europe until after the influ- 
ential book of Descartes, La Geometrie, was published in 1637 (before then, 
words and abbreviations were used as well as various competing notations). 
The symbols +, — , and as well as the symbol / for division, as in 4/5, 
were introduced by Widman in 1486. The equality sign, =, was invented by 
Recorde in 1557. Designating variables by letters was invented by Viete in 
1591, who used consonants to denote constants and vowels to denote vari- 
ables; the modern notation, using the letters a, b, c , ... to denote constants and 
the letters x,y,z at the end of the alphabet to denote variables, was introduced 
by Descartes in 1637. The exponential notation 2 2 , 2 3 , 2 4 , . . . was essentially 
invented by Hume in 1636, who wrote 2 n , 2 m , 2 1V , The symbol x for mul- 

tiplication was introduced by Oughtred in 1631; the symbol for division was 
introduced by Rahn in 1659. See Cajori [6]. 


Cubics 

The following familiar fact (to be proved in Chapter 6) was known and used 
by Renaissance mathematicians, and we will use it in this section. 

Proposition 6.15. Ifr is a root of a polynomial f ( x ), then x — r is a factor 
of f(x); that is, f (x) = (x — r)g(x) for some polynomial g(x). 

One of the simplest cubics is fix) = x 3 — 1. Obviously, 1 is root of /, and 
so x 3 — 1 = (x — l)g(x), where 

g(x) = {x 3 - l)/(x - 1) = x 2 + X + 1. 

The roots of g (and, hence, also of /) are 

co = j(— 1 + i a/ 3) and Ho = i(— 1 — i s/3), 

by the quadratic formula. Both co and To are called cube roots of unity , for 
co 3 = 1 = nJ 3 . Note that ftT = ft) 2 = 1/ft). 

We know that a positive number a has two square roots. By convention, f~a 
denotes the positive square root, so that the two square roots are ± f~a. Any 
real number a has three cube roots. By convention, Ifa denotes the real cube 
root, so that the three cube roots are l/a, co %/a, co 2 %/d. Thus, cube roots of 
unity generalize ± . 

The general cubic equation aX 3 + bX 2 + cX + d = 0 can be simplified 
by dividing both sides by a; this procedure does not affect the roots, and so 
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we may assume that a = 1 . Thus, we seek the roots of the cubic polynomial 
F(X) = X 3 + bX 2 + cX + d, where b,c,d e R. The change of variable 

X = x — -jZ? 

yields a simpler polynomial, f(x) = F(x — ^b) = x 3 + qx + r, where q 
and r are expressions in/), c , and d . We call / the reduced polynomial arising 
from F . 

Lemma 3.1. Let f (x) = x 3 + qx + r be the reduced polynomial arising from 
F(X) = X 3 + bX 2 + cX + d = 0. If u is a root of /, then u — \b is a root 
ofF. 

Proof. Since f (x) = F(x — -j/f) for all x, we have 0 = f(u) = F(u — \b)\ 
that is, u — -|Z? is a root of F(X). ■ 

We will use the following consequence of the quadratic formula. 


Exercise 3.5 on page 89 
asks you to check that 
the coefficient of x 2 in 
F(x — b ) = (x — -jZ>) 3 + 
b(x — ^b) 2 +c(x — ^b)+d 
is zero. 


Lemma 3.2. Given a pair of numbers M and N, there are (possibly complex ) 
numbers g and h with g + h = M and gh = N. In fact, g and h are roots of 
x 2 - Mx + N. 


Proof. We have 

(x - g)(x - h) = x 2 - (g + h)x + gh. 

Thus, the roots g, li of / (x) = x 2 — Mx + N (which exist, thanks to the 
quadratic formula) satisfy the given equations g + h = M and gh = N . ■ 

Let’s try to find a general method for solving cubic equations — a method 
that doesn’t depend on the specific values of the coefficients — by first solving 
a numerical equation. 

Consider the polynomial f(x) = x 3 — 18x — 35. Since the constant term 
35 = 5 • 7, we check whether ±1, ±5, ±7 are roots. It turns out that 5 is a 
root and, dividing by x — 5, we can find the other two roots by solving the 
quadratic f(x)/(x — 5) = x 2 + 5x + 7. But we are looking for a general 
method applicable to other cubics, so let’s pretend we don’t know that 5 is a 
root. 

It’s natural to look for a polynomial identity having the same form as the 
equation we are trying to solve. Example 2.27 provides one. From 

a 3 + b 3 = (a + b) 3 — 3 ab(a + b ), 

we have the identity 

(a + b ) 3 — 3 ab(a + b) — ( a 3 + b 3 ) = 0. 

Thinking of a + b as a single “chunk,” say, x = a + b, the correspondence 
looks like this: 


(a + b ) 3 

—3 ab ■ 

■ (a + b ) 

— ( a 3 + b 3 ) 


i 



□ 3 

—3 ab ■ 

□ 

—(a 3 + b 3 ) 

t 

t 

t 

t 

X 3 

-18 • 

X 

-35 


You can check that the 
other two roots are 
complex. Renaissance 
mathematicians would 
have dismissed these as 
meaningless. But stay 
tuned — we’ll soon see that 
they, too, can be generated 
by the emerging method. 


0 . 
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So, if we can find two numbers a and b such that 

—3 ab = —18 and a 3 + b 3 = 35, 

then a + b will be a root of the cubic. Hence we want a and b so that ab = 6 
and a 3 + b 3 = 35. There’s an obvious solution here, namely, a = 3 and b = 2, 
but we’re looking for a general method. Cubing both sides of ab = 6, we get 

a 3 b 3 = 216 and a 3 + b 3 = 35. 

By Lemma 3.2, a 3 and b 3 are roots of the quadratic equation 

x 2 -35x + 216 = 0. 

The roots of this are 27 and 8. So we can take a 3 = 27, b 3 = 8; surprise! 
a = 3 and b = 2. Hence, 3 + 2 = 5 is a root of our original cubic. 

The next theorem is usually attributed to Scipione del Ferro; we’ll use com- 
plex numbers and modern notation in its statement and proof, neither of which 
was available at the time. In light of Lemma 3.1, we may assume that cubics 
are reduced. 


Theorem 3.3 (Cubic Formula). The roots of f (x) = x 3 + qx + r are 
g + h, cog + cv 2 h, and co 2 g + coh , 
where co = j 1 + /' \/3^ is a cube root of unity, 

, — r + -SR q , , 4r/ 3 

g = , h = , and R = r z - 1 . 

5 2 3g 27 

Proof Let u be a root of / (x) = x 3 + qx + r and, as in the discussion above, 
we try 


u = g + h. 


We are led to 

g 3 + h 3 = -r 

gh = ~\q. 

Cube gh = —\q, obtaining the pair of equations 

g 3 + h 3 = —r 
g 3 h 3 = -±q 3 . 

Lemma 3.2 gives a quadratic equation whose roots are g 3 and h 3 : 

x 2 + rx - ±q 3 = 0. (3.1) 

The quadratic formula gives 

g 3 = 1 [~ r + \A 2 + tW 3 ) = i (-'■ + ^r) 
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and 

/r 3 = I (-/' - ^ + ^ 3 ) = 1 _ Vi) . 

Now there are three cube roots of g 3 , namely, g, cog , and co 2 g. Because 
of the constraint gh = —\q. each has a “mate,” namely, —q/{3g) = h, 
—q/{3cog) = co 2 h, and — q/(3co 2 g) = coh. Thus, the roots of/ are 

g + h, cog + its mate, co 2 g + its mate; 

that is, the roots of / are g + h , cog + co 2 h, and co 2 g + coh. ■ 

Example 3.4 (Good Example). If / (x) = x 3 — 15x — 126, then q = — 15, 
r = —126, R = 15376, and VI = 124. Hence, g 3 = 125, so we can take 
g = 5. Thus, h = —q/3g = 1. Therefore, the roots of / are 

6, 5cn + co 2 = —3 + 2 i V3, 5co 2 + co = — 3 — 2i V3. 

For Renaissance mathematicians, this cubic would have only one root — they 
would have ignored the complex roots. ▲ 

But things don’t always work out as we expect, as the next surprising ex- 
ample shows. 

Example 3.5 (Bad Example). The cubic formula may give the roots in unrec- 
ognizable form. Let 

/ (x) = (x — l)(x — 2)(x + 3) = x 3 — lx + 6; 
the roots of / are, obviously, 1, 2, and —3. But the cubic formula gives 

* S = i(-6+y^) a-d I' 1 = i (-<> - /r) • 

It is not at all obvious that g + h is a real number, let alone an integer! ▲ 

Imagine yourself, standing in Piazza San Marco in Venice in 1520, partici- 
pating in a contest. Your opponent challenges you to find a root of 
/(x) = x 3 — 7x + 6 (he invented the cubic, so he knows that it comes from 
(x — l)(x — 2)(x + 3)). Still, you are a clever rascal; your mentor taught you 
the cubic formula. You do as you were taught, and triumphantly announce that 
a root is g + h, where g 3 , h 3 are the awful expressions above. Most likely, the 
judges would agree that your opponent, who says that 1 is a root, has defeated 
you. After all, /( 1 ) = 1— 7 + 6 = 0, so that 1 is, indeed, a root. The judges 
even snickered when they asked you to evaluate f(g + h). 

With head hung low, you return home. Can you simplify your answer? Why 
is g + h equal to 1? Let’s pretend you have modern notation. Well, 

* 3 =i(-6+y^)=-3+' j ¥ ! - 

The first question is how to compute cube roots of “numbers” of the form 
a + bi , where i 2 = —1. Specifically, we want u + i v with 

(n + iv) 3 = -3 + i ^- -. 


Alternatively, having found 
one root to be 6, the 
other two roots can be 
found as the roots of the 
quadratic /(x)/(x-6) = 
x 2 + 6x + 21. 
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But see Example 3.36, 
or try to solve the system 
w 3 — 3 uv 2 = —3 and 
3u 2 v — v 3 = 10 ^ with a 
computer. 


We’ll see how to find the 
roots of a complex number 
in Section 3.3. 


Hmm! Perhaps it’s smart to separate terms involving i from honest numbers. 

(w + iv ) 3 = u 3 + 3 u 2 iv + 3 u(iv) 2 + (iv) 3 
= it 3 + 3m 2 / v — 3 uv 2 — i v 3 
= u 3 — 3 uv 2 + i(3u 2 v — v 3 ). 

Let’s see if the separation pays off. We want numbers u, v with u 3 — 3 uv 2 = 
—3 and 3u 2 v — v 3 = 10 2^ . These equations are intractible! Sigh. 

Cube roots are tough. Let’s simplify things; perhaps solving a simpler prob- 
lem, say, finding square roots, can give a clue to finding cube roots. And this 
we can do. 

Proposition 3.6. Every complex number a + bi has a square root. 

Proof. If b = 0, then a + ib = a. If a 0, then s/~a is well-known, if ci ^ 0, 
then a = — c, where c > 0, and «Ja = i *Jc. We can now assume that b 0, 
and our task is to find u + iv with 


(m + iv) 2 = u 2 + 2 iuv — v 2 = a + ib; 


that is, we seek numbers u, v such that 

2 2 
u — v = a 

and 

2uv = b. 

Since b ^ 0, Eq. (3.3) gives u ^ 0; define v 
Eq. (3.2), we have 

m 2 — ( b/2u ) 2 = a; 

rewriting, 

4m 4 — 4mm 2 — b 2 = 0 

This is a quadratic in m 2 , and the quadratic formula gives 

m 2 = |(4m ± V 16o 2 + 16Z? 2 ) 

= j(a ± s/a 2 + b 2 ). 

Since a 2 + b 2 > 0, it has a real square root. Now j (a + s/a 2 + b 2 ) is positive 
(because b 2 > 0 implies a < s/a 2 + b 2 ); hence, we can find its (real) square 
root m as well as v = b/2u. ■ 

For example, our proof gives a method finding a square root of i . Set a = 0 
and b = 1 to obtain 

/ = ( 75 (1 + /) )”- 

Alas, this square root success doesn’t lead to a cube root success, although it 
does give us some confidence that our manipulations may be legitimate. 


(3.2) 

(3.3) 

= b/2u. Substituting into 
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You can now appreciate the confusion produced by the cubic formula; a 
cloud enveloped our ancestors. First of all, what are these “numbers” a + i b ? 

Sometimes they can help. Can we trust them to always give us the truth? Is it 
true that we can separate terms involving i from those that don’t? When are 
two complex numbers equal? Does it make sense to do arithmetic with these 
guys? Do they obey the nine properties of arithmetic on page 40 that familiar 
numbers do? It took mathematicians about 100 years to become comfortable 
with complex numbers, and another 100 years until all was set on a firm foun- 
dation. 

Quartics 

A method for solving fourth degree equations was found by Lodovico Ferrari 
in the 1540s, but we present the version given by Descartes in 1637. 

Consider the quartic F(X) = X 4 + bX 3 + cX 2 + clX + e. The change 
of variable X = x — jb yields a simpler polynomial, f{x) = F{x — jb) = 

x 4 + qx 2 + rx + s, whose roots give the roots of F: if u is a root of f, then See Exercise 3.6 on 

u — \b is a root of F . Write / as a product of two quadratics: P a 9 e 89 - 

f (x) = x 4 + qx 2 + rx + s = (x 2 + jx + l)(x 2 — jx + m), 

and determine j , l, and m (note that the coefficients of the linear terms in the 
quadratic factors are j and — j because / has no cubic term). Expanding and 
equating like coefficients gives the equations 

1 + m — j 2 = q, 

j (m — £) = r, 

Im = s. 

The first two equations give Since j(m - l) = r, we 

have — i + m = r / /'. 

2m = j 2 + q + j, 

2 l = j 2 + q-j. 

Substituting these values for m and i into the third equation and simplifying 
yield a degree 6 polynomial which is a cubic in j 2 (called the resolvent cubic): 

(j 2 ) 3 + 2q(j 2 ) 2 + (q 2 — As)j 2 — r 2 . 

The cubic formula gives a root j 2 , from which we can determine m and i and, 
hence, the roots of the quartic. 

This process is an algorithm that can easily be encoded in a computer al- 
gebra system; it is known as the quartic formula. The quartic formula has the 
same disadvantage as the cubic formula: even though it gives correct answers, 
the values it gives for the roots are usually unrecognizable. But there are some 
good examples. 

Example 3.7. Let’s find the roots of 

f(x) = x 4 — 10x 2 + 1. 

First, factor / : 

x 4 — 10x 2 + 1 = (x 2 + jx + i){x 2 — jx + m)\ 
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What’s going on? Why is 
there a choice for l and 
j 2 '? If the roots of / are 
ctj for 1 < i < 4, then 
f(x) = Ui(x - a, ). A 
factorization of / into 
quadratics arises from 
grouping these four factors 
into pairs, and there is 
no reason why different 
groupings should give 
the same quadratics. 

Of course, any such 
factorization gives the 
same roots of /. 


Abel (1802-1829) also 
died young. 


in our earlier notation, q = — 10, r = 0, and s = 1 . The quartic formula shows 
us how to find Since r = 0, we have 21 = j 2 — 10 = 2 m\ hence, 

l = m. But Im = 1, so that either l = 1 and j 2 = 12 or£ = —1 and j 2 = 8. 
Taking l = 1 and j 2 = 12 gives 

f(x) = ( x 2 + \fY2x + l)(x 2 — sf\2x + 1), 

and the quadratic formula gives the four roots of / : 

a = V2+V3, fi = -V2 + V3, y = V2-V3 , 8 = -V 2-V3. A 

The quadratic formula can be derived in a way similar to the derivations 
of the cubic and quartic formulas (in Chapter 1, we derived the formula by 
completing the square). The change of variable X = x — jb replaces the 
polynomial F(X) = X 2 +bX +c with the simpler polynomial f(x) = x 2 +q , 
where q = c — \b 2 \ the roots u = ±,y — q of / (x) give the roots u — jb of 
F . Since the roots of / are 

u = ± v /= ? = ± fic — jb 2 ) = ± j 'Jb 2 — 4c, 
the roots of F are our old friends 

±i Vh 2 — 4c — jb = — b ± Vh 2 — 4c). 

It is now tempting, as it was for our ancestors, to try to find the roots of the 
general quintic F(X) = X 5 + bX 4 + c X 3 + dX 2 + eX + f and to express 
them in a form similar to those for quadratic, cubic, and quartic polynomials; 
that is, using only extraction of roots, addition, subtraction, multiplication, and 
division (of course, our ancestors hoped to find roots of polynomials of any 
degree). They began with the change of variable X = x — jb to eliminate 
the X 4 term. It was natural to expect that some further ingenious substitution 
together with the formulas for roots of polynomials of lower degree, analogous 
to the resolvent cubic, would yield the roots of F . For almost 300 years, no 
such formula was found. But, in 1824, Abel proved that there is no such quintic 
formula. 


How to Think About It. Abel’s theorem is often misquoted. It says: there is 
no formula involving only extraction of roots and the four basic operations of 
arithmetic that expresses the roots of the general quintic polynomial in terms 
of its coefficients. Succinctly, the general quintic is not solvable by radicals. 
But there are other kinds of formulas giving roots of polynomials. For exam- 
ple, here is a formula, due to Viete, giving the roots in terms of trigonometric 
functions. If f(x) = x 3 + qx + r has three real roots, then its roots are t cos 9, 
t cos (6 + 120°), t cos (9 + 240°), where t = y/—4q/3 and cos(30) = —4 r/t 3 
(there are variations using cosh and sinh when / has complex roots ([26], 
p. 445-447)). You may recall Newton’s method giving the roots as limn-^oo x n , 
where x n +\ = x n — f{x n )/f'{x n ). Now some quintic polynomials are solv- 
able by radicals; for example, we’ll see in Section 3.3 that x 5 — 1 is one such. 
Another theorem of Abel gives a class of polynomials, of any degree, which 
are solvable by radicals. Galois, the young wizard who was killed before his 
21st birthday, characterized all the polynomials which are solvable by radi- 
cals, greatly generalizing Abel’s theorem. We will look at this more closely in 
Chapter 9. 
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Exercises 

3.1 For each equation, find all roots in E and in C 

(i) x 2 — 2x = 15 (ii) x 2 — 2x =16 (iii) x 2 — 2x = —16 

(iv) 6a 2 + x = 15 (v) 6 a 2 + x = 16 (vi) 6 a 2 + x = —16 

(vii) a- 2 = 1 (viii) a 3 = 1 (ix) a 4 = 1 

(x) .\- 3 = 8 

3.2 * We know that i satisfies x 2 + 1 = 0 in C (is there another solution?). 

(i) Show, for all n e Z, that the value of i n is one of 1, i, — 1, — i. 

(ii) Use the Division Algorithm to decide which of the four values i 247 will have. 

3.3 Let at = ^(~ 1 + i \/3) be a cube root of unity. 

(i) Show, for every integer n , the value of w” is one of 1 , a > , to 2 . 

(ii) Use the Division Algorithm to decide, for any fixed n , which of the three 
values co n will have. 

3.4 Find two numbers whose 

(i) sum is 5 and product is 6. (ii) sum is 0 and product is —2. 

(iii) sum is 3 and product is 3. (iv) sum is —1 and product is 1. 

(v) sum is b and product is c (in terms o(f\6)and c). 

3.5 * If F(X ) = X 3 + bX 2 + cX + d , show that the change of variable X = x — \b 
produces a polynomial / with no quadratic term, 

f(x) = F(x — ^ b ) = a 3 + qx + r. 

Express q and r in terms of b , c, and d. 

3.6 * 

(i) Suppose that F(X) = X 4 + bX 2 + cX 2 + dX + e. 

(a) Show that the change of variable X = a — j b produces a polynomial / 
with no cubic term, 

/(a) = F( x — \ b ) = a 4 + qx 2 + rx + s. 

Express q, r, and s in terms of b, c, d, and e. 

(b) Show that if u is a root of /, then u — ^b is a root of F. 

(ii) In general, let 

F(X) = X n a n — \X n 1+ ci n — 2^ n ^ + • • • + #0 
be a polynomial of degree n. 

(a) Show that the change of variable X = a— j^a n —\ produces a polynomial 
/ with no term of degree n — 1 , 

/(a) = F( x - 1) = a" + q„- 2 x n ~ 2 + ••• + ?<)• 

(b) Show that if u is a root of /, then u — \ is a root of F. 

3.7 Take It Further. Suppose that g and h are complex numbers and 

a> = 4 1 + i v^3 j • 

Show that 

g 3 + h 3 = (g + h)(wg + a> 2 h)(a> 2 g + coh). 
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3.8 In Example 3.7, we found the roots of x 4 — I0x 2 + 1 by factoring it into two 
quadratics (which came front taking, in the notation of page 87, t = 1 and j 2 = 
12). Another choice was l = — 1 and j 2 = 8. 

(i) Using the alternate choice, get a different factorization of the quartic into 
quadratic factors. 

(ii) Show that the two factorizations produce the same linear factors. 

3.9 The following problem, from an old Chinese text, was solved by Qin Jiushao 
(Ch’in Chiu-shao) in 1247. There is a circular castle (see Figure 3.1) whose di- 
ameter is unknown; it is provided with four gates, and two lengths out of the 
north gate there is a large tree, which is visible from a point six lengths east of the 
south gate. What is the length of the diameter? (The answer is a root of a cubic 
polynomial.) 



Figure 3.1 . Castle problem. 


3.10 Show that there is no real number whose square is —1. 

3.11 (i) Find the roots of ,v 3 — 3x + 1 = 0. 

(ii) Find the roots of x 4 — 2x 2 + 8.v — 3 = 0. 

3.12 Find a complex number s so that s 3 = 9 — 46/ . 

3.13 Find the roots of x 3 — 2lx + 20. 

(i) by finding a root and reducing the cubic to a quadratic. 

(ii) by the cubic formula. 

(iii) Verify that the answers are the same. 

3.14 Suppose that a and p are roots of the quadratic equation x 2 + bx + c = 0. Find 
expressions in terms of b and c for 

(i) a + P 

(ii) a 2 + p 2 

(iii) a 3 + p 3 

(iv) (a - P) 2 

(v) Use parts (i) and (iv) to derive the quadratic formula. 

3.15 Suppose that a, p, and y are roots of the cubic equation x 3 + bx 2 + cx + d = 0. 
Show that 

(i) a + P + y = —b 

(ii) aP + ay + Py = c 

(iii) apy = —d. 
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3.16 (i) Show that a rectangle is determined by its perimeter and area. 

(ii) Is a rectangular box determined by its volume and surface area? Explain. 

3.17 Suppose that a , p, and y are roots of the cubic equation x 3 + bx 2 + cx + d = 0. 
Find, in terms of b, c, and d, 

(i) a 2 + p 2 + y 2 

(ii) a 3 + p 3 + y 3 

(iii) a 2 p 2 y + a 2 Py 2 + afi 2 y 2 

3.18 Take It Further. 

(i) Suppose that a , p, and y are three numbers whose sum is 0. Show that 

[(a - P)(a - y)(p - y)] 2 + 4 (ap + ay + py) 3 + 27 (aPyf = 0. 

(ii) Suppose that a, p, and y are roots of x 3 + qx + r. Show that 

-((« - P)(<* - Y)(P ~ Y )) 2 = 27 r 2 + 4 q 3 . 

3.19 Take It Further. When finding the roots of x 3 +qx + r with the cubic formula, 
you are led to Eq. (3.1): x 2 + rx — whose roots are g 3 and h 3 . 

(i) Show that the discriminant S of this quadratic is 

e „2 i 4 3 

0 — r + -yjCI 

(ii) If s > 0, show that the cubic has one real root and two complex conjugate 
roots. 

(iii) If S = 0, show that the cubic has two real roots, one of them with multiplic- 
ity 2. 

(iv) If S < 0, show that the cubic has three distinct real roots. 


Exercise 3.17 can be done 
without the cubic formula. 


The discriminant of the 
cubic x 3 +qx+r is defined 
to be A = —4 q 3 — 21r 2 . 


3.2 Complex Numbers 

Before the cubic formula, mathematicians had no difficulty in ignoring neg- 
ative numbers or square roots of negative numbers. For example, consider 
the problem of finding the sides x and y of a rectangle having area A and 
perimeter p. The equations xy = A and 2x + 2 y = p give the equation 
2x 2 — px + 2 A = 0, and the quadratic formula gives 

x = \[p ± vV - 16^). 

If p 2 - 16A > 0, the problem is solved. If p 2 — \6A < 0, people didn’t in- 
vent fantastic rectangles whose sides involve square roots of negative numbers. 
Instead, they merely said that there is no rectangle whose area and perimeter 
are so related. But the cubic formula doesn’t allow us to avoid “imaginary” 
numbers, for we have just seen, in Example 3.5, that an “honest” real and posi- 
tive root can appear in terms of such expressions. Complex numbers arose, not 
as an attempt to get roots of equations involving square roots of negative real 
numbers, but as a device to solve cubic equations having real coefficients and 
real roots. 

The cubic formula was revolutionary. For the next 100 years, mathemati- 
cians were forced to reconsider the meaning of number , calculating with strange 
objects of the form a + ib (where a and b are real numbers) as if they were 
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In Chapter 7, using ideas 
of abstract algebra, we’ll 
see that the naive way of 
thinking about complex 
numbers, as polynomials in 
i obeying the rule i 2 = — 1, 
can be made precise. 


Right now, a + bi is just 
an “alias” for ( a,b ) but, in 
your previous experience 
with C, the summand bi 
denoted the product of b 
and /'. We’ll soon recover 
this notion. 


You hear the following 
message when you call 
one of our friends. “The 
number you have reached 
is imaginary; please rotate 
your phone 90 degrees.” 


actual numbers enjoying the simplification rule i 2 = —1. It was during this 
time that the terms real and imaginary arose. In this section and the next, we’ll 
develop complex numbers in a more careful and formal way, and we’ll see that 
complex numbers are as real as real numbers ! 

The Complex Plane 

When considering expressions of the form a + bi, it is natural to separate the 
two summands. Geometry rears its head. 

Definition. A complex number is an ordered pair z = (a, b) of real numbers, 
denoted by z = a + bi . We call a the real part of z, denoting it by 91 (z) = a, 
and b the imaginary part of z, denoting it by 3(z) = b. 

Both the real and the imaginary parts of a complex number are real numbers. 
Moreover, equality of ordered pairs says that complex numbers z = a + bi 
and z' = a! + b'i are equal if and only if 9f(z) = 9t(z') and 3(z) = S(z'); 
that is, a = a' and b = b' . Thus, one equation of complex numbers is the same 
as two equations of real numbers. 

There is an immediate geometric interpretation of complex numbers: they 
can be viewed as points in the plane. Real numbers are complex numbers z 
with 3(z) = 0; that is, they correspond to points (a, 0) on the x-axis (which is 
called the real axis in this context). We usually abbreviate (a, 0) to a; thus, the 
set of real numbers R is a subset of C. We denote the complex number (0, 1) 
by i, so that the purely imaginary complex numbers z, those with 91 (z) = 0, 
correspond to points on the v-axis (which is called the imaginary axis in this 
context). When we view points as complex numbers, the plane R 2 is called the 
complex plane, and it is denoted by C . 

Thus, an ordered pair ( a,b ) of real numbers has two interpretations: alge- 
braic, as the complex number z = a + bi , and geometric, as the point P in 
the plane R 2 having coordinates a and b. We will use both interpretations, al- 
gebraic and geometric, depending on which is more convenient for the context 
in which we are working. 


A Injection is a one-to- 
one correspondence. See 
Appendix A.1, page 416, 
for the precise definition. 


Historical Note. Surprisingly, it took a very long time for people to embrace 
the idea of representing the elements of C as points in the plane. It wasn’t until 
Wessel presented a paper in 1797 to the Royal Danish Academy of Sciences, 
entitled On the Analytic Representation of Direction: An Attempt, did this rep- 
resentation crystallize. Wessel’s discovery was not adopted immediately but, 
by 1830, most mathematicians routinely used the bijection a + bi -o- (a. b) 
between complex numbers and points of the plane. The complex plane has 
gone by other names in its history: for example, Argand Diagram and Gaus- 
sian Plane. 


Algebraic Operations 

In Section 3.1, you saw that mathematicians were forced to add and multiply 
complex numbers. However, without precise definitions of the operations or of 
the complex numbers themselves, they could not trust many of their results. 
The complex plane allows us to resolve the many doubts our ancestors had 
about the algebra of complex numbers. 
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In a linear algebra course, K 2 is often viewed as a vector space with real 
scalars; we continue using these operations in the complex plane. 

Definition. Define addition C x C — > C by 

(a + bi) + (c + di) = (a + c) + (b + d)i. 

In terms of ordered pairs, (a, b) + (c, d) = (a + c,b + d). 

Define scalar multiplication I x C — > C by 

r(a + bi) = ra + rb i, 

where r € R. In terms of ordered pairs, r(a, b ) = (ra, rb). 

As in linear algebra, it is useful to look at each point in the plane (and, 
hence, each complex number) as an arrow with tail at the origin (sometimes 
we say vector instead of arrow). For example, we’ll think of z = 3 + 2/ either 
as the point P = (3, 2) or as the arrow OP [where O = (0, 0)]. The context 
will make it clear which interpretation we are using. 

Addition is illustrated by the parallelogram law (see Figure 3.2). If P = 
( a,b ) and Q = ( c,d ), then R = (a + c,b + cl). Of course, this needs a 
geometric proof, especially when points don’t lie in the first quadrant or they 
are collinear; see Exercise 3.33 on page 98. 



Scalar multiplication of complex numbers has the same geometric interpre- 
tation as scalar multiplication of vectors. View a complex number z = a + ib 
as OP , where P = (a. b). If r e R, then we may view rz as the vector r OP \ 
that is, if r > 0, then it’s an arrow in the same direction as OP whose length 
has been stretched by a factor of r (if r > 1) or shrunk by a factor of r (if 
r < 1); if r < 0, then r OP is the arrow in the reverse direction whose length 
has been changed by a factor of | r \ . 

The eight properties listed in the next proposition are precisely the defining 
properties of a vector space with scalars in R. 

Proposition 3.8. Let z=a+bi,w=c + di, and u = e + fi be complex 
numbers, and suppose that r, s el. 

(i) z + w = w + z 

(ii) z + (w + u) = (z + w) + u 

(iii) z + 0 = z 
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In Exercise 3.21 on 
page 98, you’ll show that 
negatives are unique. 


(iv) There is a complex number —z such that z + (— z) = 0 

(v) r(sz ) = ( rs)z 

(vi) 1 z = z 

(vii) r(z + w) = rz + rw 
(viii) (r + s)z = rz + sz 


Proof. The proofs are routine, just reducing each to a familiar statement about 
real numbers, and so we’ll only prove the longest such: associativity of addi- 
tion. It is clearer if we use ordered pairs. 

z + (w + u) = ( a , b ) + [(c, d) + ( e , /)] 

= ( a , b) + (c + e,d + f) 

= (a + (c + e), b + (d + /)) 

= ((a + c) + e, (b + d) + f) 

= (a + c,b + d) + (e, f) 

= [( a,b) + (c, d)\ + (e, f) 

= (z + w) + u. ■ 

In linear algebra, every vector (a, b) has a decomposition into components 
ae i + be 2 with respect to the standard basis e\ = (1, 0), ^2 = (0, 1): 

a + bi = (a, b) 

= (a, 0) + (0, b) 

= a(l,0) +b(0, 1). 

It follows that the + in the notation a + bi really does mean add and that bi is 
the product of b and i ; that is, bi = b( 0, 1) = (0, b). 

The set C of complex numbers has more algebraic structure: any two com- 
plex numbers can be multiplied, not just when one of them is real. The defini- 
tion arises from pretending that 

(a + bi){c + di) = ac + adi + bci + bdi 2 
= (ac — bd) + i(ad + be), 

where we have set i 2 = —1. This is precisely what our ancestors did, which 
motivates the formal definition. But our definition involves no pretending. 

Definition. Define multiplication C xC -> C by 

(a + ib)(c + id) = (ac — bd) + i(ad + be). 

In terms of ordered pairs, (a, b)(c, d) = (ac — bd, ad + be). 


Notice that 


i 2 = (0, 1)(0, 1) = (-1,0) = -1, 
forac = 0 = ad = be. 

We are now obliged to prove that the familiar properties of multiplication 
actually do hold for complex multiplication. 
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Proposition 3.9. Let z = a + bi, w = c + di, and u = e + fi be complex 
numbers. 

(i) zw = wz 

(ii) z(wu) = ( zw)u 

(iii) 1 z = z 

(iv) z(w + u) = zw + zu. 

Proof. Again, the proofs are routine, for each reduces to a familiar statement 
about real numbers. We’ll only prove associativity. As in the proof of Proposi- 
tion 3.8, it is clearer if we use ordered pairs. 

z(wu ) = (a, b) [(c, d)(e, /)] 

= (a, b)(ce — dfde + cf)) 

= (a(ce — df) — b(de + cf), b(ce — df) + a(de + cf)) 

= ( ace — adf — bde — bcf bee — bdf + ade + acf). 

On the other hand, 

( zw)u = [(a, b)(c, d)] (e, f) 

= ( ac — bd, be + ad)(e, f) 

= (ac — bd)e — (be + ad) f, (be + ad)e + (ac — bd) f) 

= (ace — bde — bcf — adf, bee + ade + acf — bdf). 

Hence, z(wu) = (zw)u. ■ 

The operations of addition and multiplication in C extend the definitions 
in R; for example, if r and s are real numbers, then their sum r + s is the same, 
whether you think of doing the addition in R or in C. See Exercise 3.22 on 
page 98. 

In Section 1 .4, we displayed nine properties of addition and multiplication 
in R, and we have just seen that eight of them also hold in C. We could now 
define subtraction, as we did there, and prove results likez(u> — u) = zw—zu, 
z- 0 = 0 for all complex numbers z, and the Binomial Theorem. We don’t have 
to repeat all of this. As we said then, once this is established, the proofs of other 
properties of addition and multiplication, such as 0 • z = 0 and (—z)(—w) = 
zw, go through verbatim. 

The ninth property describes reciprocals: If z = a + bi f 0, there is a 
number z~ x such that z • z^ 1 = 1 (Exercise 3.21 on page 98 shows that such a 
number z _1 , if it exists, is unique.) Here is an explicit formula for z“ 1 = 1/z 
when z = a + bi f 0. If h = 0, then z = a is a nonzero real number, and 
we know it has a reciprocal 1 /a. If b f 0, we can easily find x + yi so that 
(x + yi )(ci + bi) = 1. Multiply and equate real and imaginary parts: 

xa — yb = 1 and ya + xb = 0. 

The second equation gives x = — ay /b ; substitute this into the first equation 
and obtain 


b 

a 2 + b 2 


a 

a 2 + b 2 ' 


and x = 
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In either case (b = 0 or b ^ 0), 

_j a b 

a 2 + b 2 ' a 2 + b 2 ' 

There is a more elegant derivation of this formula. The denominator a 2 + b 2 
can be factored in C : 


Hence, af + b 2 — zz. a 2 -| - b 2 = (a + bi)(a — bi). 

This leads to the useful notion of complex conjugate. 

Definition. The complex conjugate z of a complex number z = a + bi is 
defined to be 


z = a — bi. 

The function C — > C, given by z i-> z, is called complex conjugation. 

If z = (a. b), then z = ( a , — b ), so, geometrically, z is obtained from z by 
reflection in the real axis. 

Complex conjugation interacts well with addition and multiplication. 

Proposition 3.10. If z = a + bi and w = c + di are complex numbers, then 

(i) z + w = z + u; 

(ii) zw = zw 

(iii) zel if and only if z = z 

(iv) z = z 


Proof. We’ll prove (i), leaving the rest to Exercise 3.25 on page 98. 

z + w = {a + c) + (b + d)i 
= (a + c) — (b + dji 
= ( a — bi) + (c — di) 

= z + u 7. ■ 

Using induction, the first two statements in the proposition can be general- 
ized: 


z 1 + • • • + z„ — Z\ + • • • + z n 

Z\ • • • Z n = Z\ • • • z n . 

The formula for the multiplicative inverse of a complex number can be writ- 
ten in terms of conjugates. Informally, cancel z to see that z/zz = 1/z. 


Proposition 3.11. Every nonzero complex number z = a + bi has an inverse : 

_j a — bi z 
a 2 + b 2 z z 
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Proof. It’s enough to see that you get 1 if you multiply z by (a—bi)/ (a 2 + b 2 ). 
And so it is: 



It wouldn’t be worth introducing the new term complex conjugation if our 
only use of it was to give a neat proof of the formula for reciprocals. The notion 
has many other uses as well. For example, if f(x) = ax 2 + bx + c has real 
coefficients, then the quadratic formula implies that whenever z is a complex 
root of /, then so is z. In fact, this is true for polynomials of any degree, and 
the proof depends only on Proposition 3.10. 

Theorem 3.12. If f(xj is a polynomial with real coefficients and a complex 
number z is a root of f, then so is z. 

Proof Suppose that 

f(x) = ao + a\x + P a\x' + 1- a n x n , 

where each a, e R. Saying that z is a root means that 

0 = ciq ~P a \z T • • • T ai z l T • • • T cinZ n . 


Hence, 

0 = 0 = ao + a\z + 1- o,-z' + 1- a n z n 

= oo + olz + 1- cijz 1 + 1- a n z n 

= ciq -P ci i z -p • • • T cii z 1 -p • • • -p a n z n 

= a o + afz + P a/z' + P a„z n (because all a, are real) 

= /(*)• 

Therefore, z is a root of /. ■ 

Exercises 

3.20 In Appendix A.4. we considered the subset P of all (strictly) positive real num- 
bers; it satisfies: 

• if a , b 6 P , then a + h e P and ab 6 P \ 

• if r e R, then exactly one of the following is true: 

r e P, r = 0, or —re P. 

(i) Using only the two properties of P , prove that if a 6 R, then either a = 0 or 
a 2 e P . 

(ii) Prove that there is no subset Q C C, closed under addition and multiplica- 
tion, such that if z 6 C, then exactly one of the following is true: 

z e Q , z=0, or — z e Q. 

Conclude that it’s impossible to order the complex numbers in a way that 
preserves the basic rules for inequality listed in Proposition A.51. 


Is this true if z is real? 


If we define a < b to 
mean b — a e P , then we 
can prove all the familiar 
properties of inequality. 
For example, if a < b and 
c < 0, then be < ac. See 
page 441 . 
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3.21 * Suppose that z is a complex number. Generalize Propositions 1.33 and 1.32. 

(i) Show that — z is unique. 

(ii) Show that — z = (— l)z. 

(iii) If z / 0, show that z -1 is unique. 

3.22 * 

(i) We may think of real numbers r and s as complex numbers. Show that their 
sum r + s and their product rs in C are the same as their sum and product 
inM. 

(ii) If z is a complex number and r is a real number, you can think of the complex 
number rz in two ways: as the product of scalar multiplication, or as the 
product of two complex numbers r + 0/ and z. Show that the two calculations 
give the same result. 

3.23 If z 6 C, show that z + z = 2(91z) and z — z = 2(3z). 

3.24 Find a complex number z such that z + z = 14 and zz = 49. 

3.25 * Finish the proof of Proposition 3.10. If z and w are complex numbers, prove 

(i) TuJ = zu7 

(ii) z = z 

(iii) z = z if and only if z is a real number 

3.26 If z is a complex number and n is a natural number, show that 

z» = (z)" . 

Is this equation true if z / 0 and n is a negative integer? 

3.27 * Let z be a complex number and r a real number. Show how to locate rz in the 
complex plane in terms of z. 

Hint. r(a,b ) = ( ra,rb ). 

3.28 Solve the following equation for z. 

(3 + 2 i)z = -3 + llz. 


Find real numbers a and b such that 

(i) 

a + bi = 

(8 + /)/(3 + 2/) 

(ii) 

a + bi = 

(8 + 0/ (3 + 0 

(iii) 

(a + bi) 2 

= -5 + 12; 

(iv) 

(a + bi) 2 

— 1 + i . 


3.30 What’s wrong with this “proof” that 6 = —6? 

6 = a/36 = V (—9) (—4) = \/— 9 V— 4 = 3 i ■ 2 i = 6; 2 = -6. 

3.31 Establish the identity 

(a 2 + b 2 )(c 2 + d 2 ) = ( ac — bd ) 2 + (be + ad) 2 
for all complex numbers a, b,c,d. 

3.32 Use Theorem 3.12 to prove that every cubic polynomial with real coefficients has 
a real root. 

3.33 * Let z = a + bi and w = c + di . Show that in the complex plane z + w is the 
fourth vertex of the (possibly degenerate) parallelogram whose other vertices are 
0, z, and w. 
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Absolute Value and Direction 

We’ve already seen that addition can be viewed as the parallelogram law, multi- 
plication by a real number can be viewed as scalar multiplication, and conjuga- 
tion can be viewed as reflection in the real axis. There is a beautiful geometric 
interpretation of complex multiplication; it is best understood if, first, we con- 
sider a different way to describe an arrow in the complex plane using absolute 
value and direction. 

Definition. The absolute value {or length or modulus) of z = a + bi is 

|z| = s/ a 2 + b 2 . 

The absolute value of a real number is its distance to the origin, and so we 
have just extended the notion of absolute value from R to C. Thus, if z = 
a + bi , then |z| is the distance from the point P = ( a , b) to the origin <9; 
equivalently, it is the length of the arrow OP. Because z z = a 2 + b 2 , we can 
write 



Proposition 3.13. Let z = a + bi and w = c + di . 

(i) \z\ > 0, and |z| = 0 if and only if z = 0. 

(ii) (Triangle Inequality). |z + tu|<|z|-|-|u;|. 

(iii) |ziu| = |z| |w|. 

Proof, (i) Both statements follow from the definition |z| = V« 2 + b 2 , be- 
cause a 2 + b 2 = 0 if and only if a = 0 = b. 

(ii) If P = ( a,b ) and Q = ( c,d ), then z is the arrow OP and w is the arrow 
OQ. As in Figure 3.2 , z + w = OR. The inequality we want is the usual 
triangle inequality, which follows from the length of a line segment being 
the shortest distance between its endpoints. 

\zw\ = y / ( zw ) (zW) 

= s/{zw) (zuJ) 

= ^/{zY) (ww) 

= y (z z) y/(w W) 

= |z||uj|. ■ 

What do we mean by direction ? The most natural way to indicate direction 
in the plane is to point: “He went thataway!” — an arrow shows the way. The 
arrow may as well have its tail at the origin and, since the length of the arrow 
doesn’t affect the direction, we may as well assume it is a unit vector, that is, 
it has length 1. If we denote the tip of the arrow by P = ( a.b ), then P lies 
on the unit circle. There are various geometric ways to describe P . One way 
is to consider the angle 9 between the x-axis and OP', hence, P = (a.b) = 
(cos 9, sin$). This angle can be described with degrees', the ancients divided 
the circle into 360 equal degrees. The angle can also be described with radians', 
the circumference of the unit circle is 2n, and 9 is the length of the arc from 


Does this equation hold if 
z is a real number? 


Make sure you can justify 
each step in the proof. 
Would this proof work if 
either z or w (or both) is 
real? 


Why did our ancestors 
divide a circle into 360 
parts? We can only guess 
why. Perhaps it was related 
to calendars, for a year has 
about 360 days. 
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(1,0) to P . When we view the point P = (cos 0, sin 0) on the unit circle as a 
complex number, it is equal to cos 6 + i sin 0. 

Figure 3.3 shows z = a + bi as the tip of an arrow of length |z| = r. The 
direction of this arrow is the same as the direction of the unit vector OP having 
the same direction as z. If 0 is the angle between the x-axis and OP , then the 
coordinates of P are | OA\ = cos 0 and \AP \ = sin 0. 



Definition. If z is a nonzero complex number, then its argument , denoted by 

arg(z), 

is the counterclockwise angle 0 from the positive real axis to OP . 


Finding arg(z) requires 
some way of comput- 
ing values of inverse 
trigonometric functions. 
Nowadays, we use com- 
puters; in earlier times, 
tables of values of cosine 
and sine were used. Fairly 
accurate trigonometric 
tables were known over 
two thousand years ago. 


In Figure 3.3, we see that the coordinates of P are cos 0 and sin 0; that is, 
if P = ( a , b), then a = cos 0 and b = sin 0. Thus, for any nonzero complex 
number z = a A- bi, not necessarily of absolute value 1, the definitions of 
cosine and sine (in terms of right triangles) give arg(z) = 0, where cos 0 = 
a/\z\ and sin0 = b / |z|. Note that 





= |z| (cos 0 + i sin 0) . 


(3.4) 


How to Think About It. Technically, the argument of a complex number 
is only determined up to a multiple of 360 (if we measure in degrees) or 2 : r 
(if we measure in radians). For example, arg(l + / ) is 45° or j radians, and 
this is the same direction as 405° or radians. There is a fussy way to make 
statements like “arg(l + i ) = 45°” precise (introduce a suitable equivalence 
relation), but we prefer, as do most people, to be a bit sloppy here; the cure is 
worse than the disease. 

Actually, the trig functions do make angles precise, for cos 0 and sin 0 have 
the same values when 0 is replaced by either 0 + 360« degrees or 0 + 2jtn 
radians. 


The polar form of z is 


z = \z\ (cos 0 + i sin 0) , 
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We may paraphrase 
uniqueness of polar forms: 
two vectors are equal if 
and only if they have the 
same length and the same 
direction. 

Proof. Existence is given by Eq. (3.4). Uniqueness is almost obvious. Suppose 
that z = r (cos 0 + i sin 0) for some r > 0 and 0. Then 

|z| = | r (cos 0 + i sin0)| = |r| |cos 9 + i sin(9| = |r| = r, 

since r > 0 and |cos(9 + / sin(9| = cos 2 9 + sin 2 0=1; thus, r = |z|. But 
then 

a .. a 1 1 « . b 

cos 9 + i sin0 = -z = — z = — + i — , 
r |z| |z| |z| 

so that 

a a 

cos 9 = - — - and sin 9 = — 

|z| |z| 

and 9 = arg(z). ■ 

Example 3.15. If z = 3 + 4/, the Pythagorean Theorem gives |z| 2 = 3 2 + 

4 2 = 25, so that |z| = 5; your favorite computer gives arg(z) = 9 = 
cos _1 (|) ss 53.13°. Thus, the polar form of z is 

z = cos 9 + / sin 9 ss 5 (cos 53.13° + / sin53.13°) . A 


where 9 = arg(z). Just as a complex number z is determined by its real and 
imaginary parts, so too is it determined by its polar form: its absolute value 
and argument. 

Proposition 3.14 (Polar Form). Every complex number z has a polar form'. 

z = r(cos0 + / sin0), 

where r > 0 and 0 < 9 < 2n. If z 0, then this expression is unique. 


The Geometry Behind Multiplication 

We’re ready to give a geometric interpretation of complex multiplication. Propo- 
sition 3. 1 3(iii) tells part of the story: the absolute value of a product is the prod- 
uct of the absolute values. To finish the geometric analysis of multiplication, 
we need to know how arg(zw) is related to arg(z) and arg(w). We may as well 
assume that z and w are unit vectors (why?), so that z = cos a + i sin a and 
w = cos fi + i sin fi. Multiply them together and collect real and imaginary 
parts. 

zw = (cosa + i sina )(cos fi + i sin fi) (3.5) 

= (cos a cos fi — sin a sin fi) + / (cos a sin fi + sin a cos fi). (3.6) 

Do Sfi(zw) and S(zw) look familiar? They are the addition formulas for sine 
and cosine: 


We know that zw sits on 
a circle of radius |z|M, 
centered at the origin. But 
where? 


and 


cos(a + fi) = cos a cos fi — sin a sin fi 


sin(a + fi) = cos a sin/3 + sin a cos fi. 

These formulas will give a beautiful characterization of the product of two 
complex numbers. We now prove them, beginning with a familiar lemma that 
uses Figure 3.4. 
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We are proving that per- 
pendicularity is equivalent 
to the dot product being 0. 


In Figure 3.4, the coor- 
dinates of U are labeled 
(c,d). This lemma shows 
that ( c,d ) = (—s, r). 


Lemma 3.16. IfW = (r,s) and U = (c,d), then the arrows OW and OU 
are perpendicular if and only if rc + sd =0. 


Proof. We use the Pythagorean Theorem and its converse, Exercise 1.11 on 
page 7 : ~OU _L OW if and only \L\UW\ 2 = \ OU\ 2 + \ OW\ 2 . Let h = \UW\; 
then 


h 2 = (r - c) 2 + (s - d) 2 

= r 2 - 2 rc +c 2 + s 2 - 2 sd + d 2 
= r 2 + s 2 + c 2 + d 2 - 2 (rc + sd). 

But | OW\ 2 = r 2 + s 2 and | OU\ 2 = c 2 + d 2 . Hence, 

\OW\ 2 + \OU\ 2 = (r 2 + s 2 ) + (c 2 + d 2 ) 

and 

h 2 = r 2 + s 2 + c 2 + d 2 — 2 (rc + sd). 
Therefore, h 2 = \ OU\ 2 + \ OWf 2 if and only if rc + sd = 0. ■ 
Now for the addition formulas. 


Theorem 3.17 (Addition Theorem). Let a and ft be angles. 

(i) cos(a + fi) = cos a cos ft — sin a sin f. 

(ii) sin(a + fi) = cos a sin fi + sin a cos fi. 


We are looking at points 
here as elements of R 2 , al- 
though we’ll soon interpret 
this diagram in the complex 
plane. 


Proof. In Figure 3.5, we have a picture of the unit circle. Let Z = (a, b) = 
(cos a, sina) and W = (r,s) = (cos fi , sin fi). Rotate A OQZ counterclock- 
wise through Zfi to get AOQ' Z' , so that A OQZ and A OQf Z 1 are congru- 
ent. Thus, Z' = (cos(a + fi), sin(a + fi)) . Our task is to find the coordinates 
of Z' in terms of r, s, a, and b. 

Define U = (—s, r). Since W = (r, s) is on the unit circle, we have 
r 2 + s 2 = 1, and so U = (—s, r) is also on the unit circle. Moreover, since 
(— s)r + rs = 0, Lemma 3.16 says that OU is orthogonal to OW. Therefore, 
O Q' Z' M is a rectangle. 

Decompose OZ' as the sum of two vectors: 


OZ' = OQ' + OM, 
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where OM is the projection of Q' Z' onto OU . We can get explicit expres- 
sions for Q' and M . First, OQ' is a scalar multiple of OW and, because 
| OQ | = a, we know the scalar: 

Q' = a(r,s) = (ar,as). 

Second, OM is a scalar multiple of OU , where U = ( —s , r); as \OM\ = 
\QZ\ = b, we know the scalar: 

M = b(—s, r) = (— bs , £r). 


Therefore, 

OZ' = 00' + OM = (ar, as) + (—bs, br ) = (ar — bs, as + hr). 

Making the appropriate substitutions for a, b, r, and s, we have the desired 
result: 

cos(a + j6) = cos a cos — sin a sin /J 

and 


sin(a + fi) = cos a sin /l + sina cos /3. ■ 

Here is the result we have been seeking. 


Theorem 3.18 (The Geometry of Multiplication.). If z and w are complex 
numbers, then 

(i) |ztu| = \z\ |uj|, and 

(ii) arg(zin) = arg(z) + arg(tu). 

Proof. The first statement is Proposition 3.13, and the second follows from 
Theorem 3.17 and Eq. (3.5) on page 101. ■ 


In words, the length of 
a product is the product 
of the lengths, and the 
argument of a product is 
the sum of the arguments. 
The equality in Theo- 
rem 3.1 8(ii) holds up to a 
multiple of 2n. 
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If we set arg(z) = a and arg(u>) = P, then Theorem 3.18 has an especially 
pleasing restatement in polar form. 

Corollary 3.19. Ifz = |z|(cosa + sina) and w = |u>|(cos/3 + sin P), then 

z ■ w = \zw\ (cos(a + P) + i sin(a + P)) . 

Proof. Both sides equal |z| (cosa + i sina) • |u>| (cos ft + i sin P). ■ 

It follows easily, by induction on k > 1, that if z is a complex number and 
k e Z , then 


z k = \z\ k and arg(z fc ) = karg(z). 


This approach does not 
seem to be very well 
known. It appears in [22] 
and Kerins, B. “Gauss, 
Pythagoras, and Heron’’ 
(Mathematics Teacher, 
96:5, 2003), but we can’t 
find any older sources. 


How to Think About It. There’s a way to see, without using trigonometry, 
that angles add in the product of two complex numbers. Essentially, we recast 
the proof of Theorem 3.17 in terms of complex numbers. Given z = a + bi 
and w = r + si, we want to determine arg(zit>) in terms of arg(z) = a and 
arg(u;) = p. We can assume that z and w are unit vectors; this implies that zu) 
is also a unit vector, by Proposition 3.13. The key insight is that 

zw = ( a + bi)w = aw + (bi)w = aw + b(iw). 

You know the geometric effect of scalar multiplication (Exercise 3.27 on page 98), 
you know how to add geometrically (parallelogram law), and you know 


iw = i (r + si) = — s + ri ; 


using Lemma 3.16, it follows that iw is obtained from w by counterclock- 
wise rotation by 90°. Figure 3.6 below is almost the same as Figure 3.5; the 
difference is that points are now labeled as complex numbers. 



Figure 3.6. Complex multiplication again. 
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Let’s put this all together. Triangle OZ' Q' is congruent to triangle OZQ, 
so that ZZ' OW = ZZOQ = a\ hence, arg(zu;) = a + /3. We have shown 
that Theorem 3. 1 8(ii) follows without any mention of trigonometry. 

We have just used plane geometry to derive the geometric interpretation of 
complex multiplication, avoiding the trigonometric addition formulas. Aside 
from proving these ideas for students who haven’t yet seen the addition for- 
mulas, complex numbers can now be used to derive these formulas. That’s an 
additional bonus, especially for a precalculus class; it allows us to use complex 
numbers when trying to establish other trigonometric identities that depend on 
the addition formulas. For example, to get a formula for cos + 9), calculate 
like this: 

cos ( j + 9) + i sin (f- + 9) = (cos j + i sin ?-) (cos 9 + i sin 9) 

= -j= (1 + i) (cos 9 + i sin 9) 

= ((cos 9 — sin 9) + i (cos 9 + sin 9)) . 

Hence 


cos ( f- + 9) = -j= (cos 9 — sin 9 ) 

and, as a bonus. 


sin ( f- + #) = ~^2 (cos 9 + sin0 ). 


Over the next two centuries, people became comfortable with the fact that 
polynomial equations with real coefficients can have complex solutions. It was 

eventually proved that every polynomial f(x) = x n + c n -\x n ~ x -\ Vc\x + 

Co with real coefficients has a factorization 


f(x) = (x - on) ■ ■ ■ (x - a n ), 


where a\,...,a n are complex numbers. This amazing result holds for any 
nonzero polynomial / with complex coefficients; it is known as the Funda- 
mental Theorem of Algebra. We won’t prove this result here because, in spite 
of its name, it is a theorem of analysis, not of algebra; you can find a readable 
account in [4], pp. 142-152. 


Exercises 

3.34 If z = a + hi , prove that the arrow corresponding to z, namely OP , where 
P = ( a.b ), is perpendicular to the arrow corresponding to iz. 

3.35 If z and u> are complex numbers with u> ^ 0. show that 

arg(z/ w) = arg(z) - arg(u;). 

3.36 (i) Prove that the quadratic formula holds for polynomials with complex coeffi- 

cients (use Proposition 3.6). 

(ii) Find the roots of x 2 + 2ix — 1. Why aren’t these roots conjugate? 

3.37 If z and w are complex numbers, find a necessary and sufficient condition that 
|z + w\ to equal |z| + |iu|. 
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3.38 * If z = cos a + i sin a, show that 

z = 1/z = cos(— a) + i sin(— a). 

3.39 Let n > 0 is an integer and £ = cos(^) + i sin(^). If z is a complex number, 
give a geometric description of how fz is located with respect to z on the complex 
plane. 

3.40 Preview. Plot the roots in the complex plane for each of the polynomials x 2 — 1, 
x 3 — 1, x 4 — 1, x 6 — 1, and x 12 — 1. 

3.41 Preview. Let £ = cos(45°) + i sin(45°). 

(i) Show that f 8 = 1 

(ii) Show that the distinct roots of x 8 — 1 are precisely 1, £, £ 2 , . . . , f 7 . 

(iii) Plot these roots in the complex plane. 

(iv) Show that J 147 is a root of x 8 — 1 . 

(v) Show that £ 147 is equal to one of the roots in part (ii). Which one? 

3.42 Preview. Let n be a nonnegative integer and £ = cos( + i sin( ). 

(i) Let £ = cos(^L) + i sin(^). Show that = 1. 

(ii) If k is any nonnegative integer, show that = 1. 

(iii) Give a geometric description of the subset {7 : k > 0} of the complex plane. 

3.43 Take It Further. If f(x) is a polynomial with complex coefficients, define / to 
be the polynomial you get by replacing each coefficient in / by its conjugate. 
Prove the following statements. 

(i) / + g = 7 + 1- 

(ii) fg = fg- 

(iii) f = f if and only if / ( x ) has real coefficients. 

(iv) / / has real coefficients. 

(v) / (z) = J (X)- 

3.44 Take It Further. Suppose f(x) is a polynomial with coefficients in C. If a 
complex number z is a root of /, show that z is a root of /. 

3.45 Take It Further. Suppose f(x) is a polynomial with coefficients in C. Then 
by Exercise 3.43 on page 106. if we define g(x) = f(x) f (x), then g(.v) has 
coefficients in R. Show that if g(z) = 0, either /(z) = 0 or /(z) = 0. Hence 
conclude that if every polynomial with real coefficients and degree at least 1 has 
a root in C , then every polynomial with complex coefficients and degree at least 
1 has a root in C . (The Fundamental Theorem of Algebra says that every polyno- 
mial with complex coefficients has all its roots in C. This exercise shows that it’s 
enough to prove this for polynomials with real coefficients.) 

3.3 Roots and Powers 

We saw in the previous section that every point z on the unit circle can be 
written as z = cos 6 + i sinO for some angle 6. Theorem 3.18 tells us that 
arguments add when complex numbers are multiplied. In particular, 

(cos 6 + i sin 6) 2 = (cos 9 + i sin 9) (cos 9 + i sin 9) 

= cos (9 + 9) + i sin(# + 9) 

= cos(2 9) + i sin(20). 
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On the other hand, complex multiplication gives 

(cos # + / sin 9 ) 2 = (cos 2 9 — sin 2 9) + i 2 cos 9 sin 9. 

Equating real parts and imaginary parts gives the double angle formulas : 

cos(2 9) = cos 2 9 — sin 2 9 
sin(2#) = 2 cos 9 sin 9. 

We now generalize this to any positive integer power. 

Theorem 3.20 (De Moivre). For every angle 9 and all integers n > 0, 

(cos# + i sin#)” = cos(/7#) + i sin(«#). 

Proof. We prove equality by induction on n > 0. The theorem is true when 
n = 0, for cos 0=1 and sin 0=0. Here is the inductive step. 

(cos # + i sin#)” = (cos # + i sin#)” -1 (cos # + i sin#) 

= ^cos ((« — 1)#) 4- i sin((/7 — 1)#)) (cos# + i sin#) 

= cos ((;? — 1)# + 9) + i sin ((« — 1)# 4- #) 

= cos(/7 #) + i sin(/?#). ■ 

Example 3.21. (i) 

(cos 3° + i sin3°) 40 = cos 120° + / sin 120° = —j + i ^ . 

(ii) Let r = cos(45°) + i sin(45°) = -i=(l + i ). We compute z 6 in two ways: 

• With the Binomial Theorem: 

z ‘ = (t5 (1 +,) ) S 

= ^ (1 4~ 6/ 4~ 1 5 f 2 + 20/ 3 + 15/ 4 4” 6/ 3 4~ / ^ ) 

= | (1 4- 6/ - 15 - 20/ + 15 + 6/ - 1) 

• With De Moivre’s Theorem: 

z 6 = [cos(45°) 4- / sin(45°)] 6 = cos(6 • 45°) 4- / sin(6 • 45°) 

= cos(270°) + / sin(270°) = -/. ▲ 

Polar Decomposition and De Moivre’s Theorem combine to give a nice 
formula for computing powers of any complex number. 

Corollary 3.22. If z = r (cos a + i sin a) is a complex number, then 

z” = r n (cos(na) + i sin(/7«)l. 
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Of course, we must find 
the polar form of z, 
which involves finding 

6 = cos - 1 (a/|z|). 


We were unable to find a cube root of z = a + i b earlier, but there’s no 
problem now. 


Corollary 3.23. Let r(cos 9+i sin#) be the polar form of a complex number z. 
If n > 1 is an integer, then 


^ 1/r(cos(9/n) + i sin(#/n)^ 


r(cos 9 + i sin 9) = z. 


Example 3.24. In Example 3.15, we saw that the polar form of z = 3 + 4/ is 
approximately 5 (cos(53.13°) + i sin(53.13°)). Now (53 . 1 3)/3 = 17.71, and 
so a cube root of z is approximately 4^5 (cos(17.71°) + i sin(17.71° )). Our 
calculator says that 

(V5 (cos 17.71° + i sin 17.71 0 )) 3 = 3.000001 + 3.99999 i. A 


It is said that Euler was 
delighted by the special 
case 


We are now going to describe a beautiful formula discovered by Euler. Re- 
call some power series formulas from calculus. For every real number x, 

x 2 x n 

e = l + x+ — + ••• + — + •••, 

2! IV. 


x 2 x 4 

cos x = 1 1 

2! 4! 


( — l) n x 2n 

+ 1 - 

( 2 «)! 


and 

x 3 x s (-ly-'x 2 "^ 

sin x = x 1 -I E • • • . 

3! 5! (In + 1)! 

We can define convergence of power series c « z " for z and c n com- 

plex numbers, and we can then show that the series 

z 2 z” 

1+ z + — + •••+ — + ••• 

2! n\ 

converges for every complex number z. The complex exponential e z is defined 
to be the sum of this series. In particular, the series for e lx converges for all 
real numbers x, and 

ix , . Ox) 2 Ox)" 

= l + ix + -1— + • • • + — — + • • • . 

2! n\ 

Theorem 3.25 (Euler). For all real numbers x. 


cos x + i sinx. 


e i7Z + 1 = 0, 

for it contains five important 
constants in one equation. 


Sketch of proof We will not discuss necessary arguments involving conver- 
gence. As n varies over 0, 1,2, 3, 4, 5,..., the powers of i repeat every four 
steps: that is, the sequence 


1, i, i 2 , /, / , /, i 


6 i 1 , 0 


i 9 , r' 10 , i 11 , 


is actually 


1, i, — 1, —i, 1, i, —1, — i, 1, i, —1, — i, . 
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the even powers of i are all real, whereas the odd powers all involve i. It 
follows, for every real number x, that (ix) n = i n x n takes values 

1, ix, — x 2 , — ix 3 , x 4 , ix 5 , —x 6 , —ix 1 , x 8 , ix 9 , — x 10 , — ix 11 , ... 

Thus, in the definition of the complex exponential, 

,, , . (ix) 2 (i ix)" 

e = 1 + /x + — + •••+ — — +••• , 


2 ! 


i! 


the even powers of ix do not involve i, whereas the odd powers do. Collecting 
terms, one has e lx = even terms + odd terms. But 

(ix) 2 (ix) 4 

even terms = 1 H 1 1- • • • 


2 ! 

, x 2 x 4 

1 — — -T — 
2! 4! 


4! 


and 


, , . (ix) 3 (ix) 5 

odd terms = i x 4 1 )-•••. 


3! 

x 3 x 5 


5! 


= i(X ~ 31 


Therefore, e' 


cos x + i sin x . 


As a consequence of Euler’s Theorem, the polar decomposition can be 
rewritten in exponential form: every complex number z has a factorization 


z = re 


id 


where r > 0 and 0 <9 < 2jr. 

We have chosen to denote by e' x , but we cannot assert, 

merely as a consequence of our notation, that the law of exponents, e ,x e' y = 
£ i(x+y), j s vajjj g u t j s precisely what Corollary 3.19 says once it is trans- 
lated into exponential notation. 

Theorem 3.26 (Exponential Addition Theorem). For all real numbers x and 

y . 

e ix e iy = e i( - x+y \ 

Proof. According to Corollary 3.19, 

e lx e ly = (cos x + i sin x)(cos y + i siny) 

= cos(x + v) + i sin(x + v) 

= e ,(x+y) . m 

We can also translate De Moivre’s Theorem into exponential notation. 

Corollary 3.27 (Exponential De Moivre). For every real number x and all 
integers n > 1, 
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Proof. According to De Moivre’s Theorem, 

(i e lx ) n = (cosx + i sin x) n = cos (nx) + i sin(«x) = e' nx . ■ 

It is easier to remember the trigonometric addition formulas in complex 
form. For example, let’s find the triple angle formulas. On the one hand, De 
Moivre’s Theorem gives 

e' 3x = cos(3x) + i sin(3x). 

On the other hand, 

e i3x = (e ix ) 3 

= (cosx + i sinx) 3 

= cos x + 3 i cos x sinx + 3 i cosx sin x + i sin x 
= cos 3 x — 3 cos x sin 2 x + i (3 cos 2 x sin x — sin 3 x) . 

Equating real and imaginary parts, we have 

cos(3x) = cos 3 x — 3 cos x sin 2 x 

and 

sin(3x) = 3 cos 2 x sinx — sin 3 x. 


Roots of Unity 

De Moivre’s Theorem can be used to find the roots of an important family of 
polynomials: those of the form x" — 1. 

Theorem 3.28. The distinct roots of x n — 1 are 

1 , £ 2 , ••• , £" -1 , 

where £ = = cos(2n/n) + / sin(27r/«). These numbers are equally spaced 

on the unit circle and are the vertices of a regular polygon, called the unit 
n-gon. 


Proof. By Corollary 3.23, = 1, so that £ is a root of x" — 1. Furthermore, 

for any nonnegative integer k, we have (f k ) n = (f n ) k = 1, so that all t, k are 
also roots of x" — 1. But there are repetitions on the list 1, £, f 2 , • • • . By the 
Division Algorithm, for any j , we have j = qn + r, where 0 < r < n — 1. 
Hence, 


j-j _ f-qn+r _ ^ qn p-r _ ^ r 


because t > qn = 1. 

On the other hand, all the t, k , for 0 < k < n — 1, are distinct. After all, by 
De Moivre’s Theorem, 

t, k = cos(2nk/n) + i sin(2jtk/n), 

and Proposition 3. 14, uniqueness of polar forms, applies, for 0 < 2Ttk/n < 2jt 
are n distinct angles. Therefore, we have displayed n distinct roots of x" — 1 . 
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These are all the roots of x" — 1, for a polynomial of degree n can have at 
most n distinct roots. We’ll give a proof of this later (see Theorem 6.16) but, 
since we haven’t yet proved this result, we now proceed in a different way. 

If z e C is a root of x n — 1, then 1 = |z"| = |z|", so that |z| = 1, 
by Exercise 1.73 on page 41, and z = cos 9 + i sin# for some 9. By De 
Moivre’s Theorem, 1 = z” = cos(n#) + i sin(n#), so that n9 = 2 rck for 
some integer k; hence, 9 = 2: xk/n. Write k = qn + r, where 0 < r < n, and 

z = cos(2jtk/ n) + i sin(2nk/n) = cos(27T r / n) + i sm{2nr /n). 

Thus, z is equal to the root £ r already displayed. 

Finally, since arg = k arg (£) = 2nk/n, the roots t, k are equally 
spaced around the circle and, hence, they are the vertices of a regular n-gon. 
(See Figure 3.7 for the case n = 8.) ■ 


Definition. The roots of x" — 1 are called the nth roots of unity. An nth root 
of unity £ is a primitive nth root of unity if n is the smallest positive integer 
for which = 1 . 

For every n > 1, we see that = e l7T ^ n is a primitive nth root of unity, for 
if 1 < m < n, then t, m = cos(27rm/n) + i sm(2jtm/n ) ^ 1. In particular, 
i = cos(27r/4) + i sin(27r/4) is a primitive fourth root of unity, and w = 
1 + \/3) = cos (2^/3) + i sin(27r/3) is a primitive cube root of unity. 


Corollary 3.29. Let t, k = cos(27T k / n) + i sin(2.Tck/n) beannth root of unity. 

(i) £ k is a primitive nth root of unity if and only ifgcdik, n) = 1. 

(ii) Ift, k is a primitive nth root of unity, then every nth root of unity is a power 

ofi, k - 


Proof, (i) Suppose that t, k is a primitive nth root of unity. If cl = gcd (k,n)> 
1 , then n / d < n, and 

^jrkyi/d ^n^k/d ^ 

This contradicts n being the smallest positive integer with {t, k ) n = 1 . 

Suppose that t, k is not primitive; that is, ( t, k ) m = 1 for some m < n. 
Since, by hypothesis, gcd(Ar, n ) = 1, there are integers ,s and t with 1 = 
sk + tn ; hence, m = msk + mtn. But now 

s-m ymsk-\-mtn j-msk ^mtn i 


which contradicts £ being a primitive nth root of unity. 

(ii) Every nth root of unity is equal to t, J for some j . If gcd (k, n) = 1, then 
there are integers s and t with 1 = sk + tn. Hence, 


j-j _ j-jsk+jtn _ ^jsk^jtn _ 


Definition. For every integer n > 1, define the Euler f-f unction <f>(n ) by 
<p(n) = number of k with 1 < k < n and gcd(/c, n) = 1 . 


For example, 0(1) = 1 and, if p is prime, <f>{p) = p — 1. 
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Corollary 3.30. For every positive integer n, there are exactly <p{n ) primitive 
nth roots of unity. 

Proof. This follows at once from Corollary 3.29(i). ■ 

Example 3.31. The complex number cos(^) + i sin(^) is a primitive nth 
root of unity, by Theorem 3.28. The 8th roots of unity (shown in Figure 3.7) 
are 


cos(^) + i sin(^), cos(^) + i sin(^), 

cos(— ^— ) + i sin(i^), cos(±|^) + i sin(i|^); 

that is, the primitive 8th roots of unity are all those cos(2^-) + / sin(=^) for 
which gcd(k, 8) = 1. ▲ 



Figure 3.7. 8th roots of unity. 


The nth roots of unity enjoy some remarkable properties that we’ll use in 
upcoming chapters; here are some of them. (See Exercise 3.51 on page 115 
and Proposition 6.63 for some other interesting properties.) 

Theorem 3.32. Let £ be an nth root of unity. 

(i) i + ^ + ^ + --- + ^ n ~ 1 =0. 

(ii) t, k = 1 / for every nonnegative integer k . 

(iii) Ifk = qn + r, then £ k = . 

Proof, (i) We have x" — 1 = (x — 1 )q(x), and we find q by long division: 
(x - 1) (1 + X + x 2 + • • • + x" -1 ) = x" - 1. 

Now set x = £ to see that 

(£ - 1)(1 + £ + £ 2 + • • • + C " _1 ) = £"-1 = 0 . 

But £ — 1 ^ 0, and the result follows. 

(ii) This follows from Exercise 3.38 on page 106. 

(iii) This follows from = 1. ■ 
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Theorem 3.28 establishes an intimate connection between the nth roots of 
unity and the geometry of the unit n-gon. The next examples illustrate this 
connection for small n . 


See Exercise 3.49 on 
page 115 for more exam- 
ples. 


Example 3.34 (Regular Pentagon). Since a primitive 5th root of unity is £ = 
cos(27r/5) + i sin(2:r/5), by Theorem 3.28, the vertices of the unit 5-gon are 

£ = cos(2jr/5) + i sin(2jr/5) 

£ 2 = cos(47r/5) + i sin(47r/5) 

£ 3 = cos(6:r/5) + i sin(6:r/5) 

£ 4 = cos(87r/5) + i sin(87r/5) 

£ 5 = cos(10:r/5) + i sin(10:r/5) = 1 = £°. 


Example 3.33. The vertices of unit «-gons for small values of n can be calcu- 
lated with plane geometry. 

• The vertices of the unit 3-gon are 1, j(— 1 + i V3), j(— 1 — i V3). 

• The vertices of the unit 4-gon are 1 , i, —1, —i. 

• The vertices of the unit 6-gon are 

1, i(l+/V3), ±(-l+/’V3), -1, i(— 1— / V3), i(l-iV3). A 



Can we find explicit expressions for these vertices that don’t involve trigonom- 
etry? We’ll obtain such an expression for cos(2tt/5), but we’ll leave the rest of 
the details for you (Exercise 3.49(i) on page 115); after all, you can evaluate, 
say, cos(8:r/5). 

We have £ = £ 4 , by Theorem 3.32(ii). Inspired by Lemma 3.2, we define 
g = £ + £ 4 and h = f + ?■ 


Now 

g = £ + £ 4 = 2cos(2:r/5), 

a real number that is twice the number we are after. Similarly, 
h = £ 2 + £ 3 = 2cos(4jt/5), 



114 Chapter 3 Renaissance 


another real number. By Theorem 3.32(i), 

8 + h = S + £ 4 + £ 2 + £ 3 

Thus, we know that g + h = —1. Do we also know ghi Let’s see. 
gh = (£ + £ 4 ) tf 2 + £ 3 ) 

= f + j- 4 + t-6 + t-l 

= £ 3 + £ 4 + £ + £ 2 by Theorem 3.32(iii) 

Hence, g + h = —1 = gh, and so g and h are the roots of 

x 2 + x — 1. 

Now x 2 + x — 1 has a positive root and a negative one. Since g > h (why?), 
the positive root is g, and so 

cos(2jr / 5) = \g 

= I(-1 + V5). A 

An ancient problem, going back to the Greeks, is to determine which regular 
n-gons can be constructed with ruler and compass. As we’ll see in Chapter 7, 
the problem comes down to finding an expression for cos(2jr/n ) that doesn’t 
mention trigonometry, only the operations of arithmetic and iterated square 
roots (as in Example 3.34). We’ve essentially shown that the regular pentagon 
is so constructible. This argument, grouping the C, k into convenient subclusters, 
was greatly generalized and refined by Gauss (when he was only 17 years old!) 
to show that the vertices of the unit 17-gon can be constructed (Euclid did not 
know this!). (This and much more is in Gauss’s masterpiece, Disquisitiones 
Arithmeticae.) Gauss requested that his tombstone portray a regular 17-gon, 
but the stonemason was unable to carve it, saying it would look more like a 
circle than a polygon. 


Exercises 

3.46 Is De Moivre’s Theorem true for negative integer exponents? Explain. 

3.47 Let z = cos 9 + / sind. Show, for all nonnegative integers n, that 

z" + (z)" = 2 cos n6 and z" — (z)” = 2 sin nO. 

3.48 This exercise shows that there’s something special about a 12° angle: there’s only 
one isosceles triangle (up to similarity) whose base angle is twice the vertex an- 
gle, namely, the “72-72-36 triangle.” Let the equal sides of such a triangle have 
length 1, and let q denote the length of the base. 

(i) Bisect one of the base angles of the triangle. 

(ii) Show that the small triangle is similar to the whole triangle. 

(iii) Use (ii) to show that 4 = and solve for q. 

(iv) Show that q/2 = cos 12°. 
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Figure 3.9. 72-72-36 triangle. Figure 3.10. Its construction. 


3.49 * Find explicit formulas (i.e., without trigonometry) for the vertices of the unit 

(i) pentagon. (ii) decagon. (iii) 20-gon. 

3.50 * Let n be a positive integer and let £ = e 27r, 7". 

(i) Establish the identity 

x n -1 = (x- l)(x - ?) (x - ? 2 ) ••• (x - f" _1 ) . 

(ii) If x and y are integers, show that 

x" - v" = (x - y)(x - 'C,y) (x - £ 2 y) • • • (x - ?" _1 y) • 

(iii) If x and y are integers and n is odd , show that 

x" + y” = (x + y)(x + fy) (x + f 2 y) • • • (x + ?" _1 y) . 

3.51 Take It Further. We saw, on page 111, that <p(p ) = p — 1, where p is prime 
and 0 is the Euler-0 function. 

(i) Suppose n is the product of two primes, n = p\P 2 • Show that 


4>(n) = (Pi - 1 )(P2 ~ 1). 

(ii) Suppose n is the product of two primes powers, n = p\ l p '. . Show that 


n n n 
(p(n) = n h 


Pi P 2 Pi P 2 

(iii) Generalize to show that, if n = p j 1 p e ^ . . . p ^ 1 , then 


0 -^) 0 -^)- 


<f>(n) = n 


- fl (l-— ) 

LP Pk) 


3.52 Prove or disprove and salvage if possible. If a and b are positive integers. 


<p(ab) = <p(a)<p(b). 


3.53 Find explicit formulas (i.e., without trigonometry) for the vertices of the unit n- 
gon if 

(i) n = 3 
(iii) n = 6 
(v) n = 12 


Note that if n = p\P 2 , 
then Qn - l)Cp 2 - 1) = 

„ n n I n 


(ii) n = 4 
(iv) n = 8 
(vi) n = 16 
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3.54 For all integers n between 3 and 9, find all the primitive nth roots of unity. 

3.55 Find a primitive 12th root of unity Is f unique? 

3.56 Suppose!; = cos(3^) + i sin(3^). 

(i) Plot the roots of x 1 — 1 in the complex plane. 

(ii) Show that a = f + £ 6 , /S = f 2 + £ s , and y = £ 3 +£ 4 are real numbers. 

(iii) Find a cubic equation satisfied by 2 cos(-2^ ) by finding the values of a+fi+y, 
a/3 + ay + f)y , and afiy. 

3.57 If = cos( 2ZL) + sin(^), evaluate Efc=o ?« • 

3.58 Show that cos( ^5-) + cos(^) = — 4- 

3.59 Take It Further. If n is a nonnegative integer, how many irreducible factors over 
Z does x" — 1 have? In other words, we’re looking for a pattern in the outputs of 
the function n i-> # of factors of x n — 1 over Z. (Use a computer). 


n 

Number of Factors of x n — 1 

i 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 



3.4 Connections: Designing Good Problems 

This section will use complex numbers to help create mathematics problems 
that “come out nice.” When launching a new topic, you want to start with 
examples which focus on the new idea; there shouldn’t be any distractions — 
for example, numbers should be simple integers or rationals. Indeed, this is 
why the Babylonians introduced Pythagorean triples. 

Norms 

We begin by introducing a function C — > R, called the norm , that is closely 
related to absolute value. It will be an important tool for our applications; it will 
also be very useful in Chapter 8 when we do some algebraic number theory. 

Definition. The norm of a complex number z = a + bi is 


N(z) = zz = a 2 + b 2 . 
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Here are some basic properties. 

Proposition 3.35. Let z = a + ib and w be complex numbers. 

(i) N(z ) is a nonnegative real number, and N(z ) = 0 if and only if z = 0. 

(ii) N (z) = |z| 2 . 

(iii) N(zw ) = N(z)N(w). 

Proof, (i) This follows at once from N(a + bi) = a 2 + b 2 . 

(ii) This follows at once from |z| = \a + bi\ = Vo 2 + b 2 . 

(iii) N(zw) = zw TuJ = zw~z w = zT wuJ = N(z)N(w). ■ 

It follows from Proposition 3.35(iii) that 

N(z k ) = N(z) k 


for all z and all k > 0. 

Here is an application of the norm. 

Example 3.36. Let’s revisit Example 3.5, the “bad example,” in which the 
cubic formula gives the roots of x 3 — lx + 6 = (x — l)(x — 2)(x + 3) in 
unrecognizable form. 

Imagine again that you have just left the contest in Piazza San Marco, think- 
ing about how g + Ii could possibly equal 1, where 

g 3 = — 3 + i-^V 3 and h 3 = — 3 — i^-V 3. 

Had you known about conjugates, you’d have seen that g 3 = h 3 . It would have 
been natural to guess that the cube roots g and h are also complex conjugates 
(you’d have guessed right: see Exercise 3.64 on page 127); thus, g = a + ib 
and h = a — ib. Now if g + h = 1, as your opponent loudly proclaimed, then 
(a + ib) + (a — ib) =2 a = 1; that is, 

g = j + ib and h = j — ib. 

You really want to find g and h now — what is bl Using the norm function, you 
see that 

N(gf = N(g 3 ) = (— 3) 2 + ) 2 = M3. 

Since norms are always real numbers, you conclude that 



(the other cube roots are complex; they are and |m 2 , where o> is a primitive 
cube root of unity). But if g = i + ib, then N(g) = j + b 2 . Hence, \+b 2 = 
and b = ±7^75- Thus, 

s = i + / I7 1 and h=l 2~ i fTy 


Bingo! For these “values” of g and h, we have g + h = 1. You were right! 
Elated, you run back to the square to show off g and h, but everyone has gone 
home. A 


To find the other two roots, 
see Exercise 3.65 on 
page 127. 
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See Exercises 
on page 44. 


Pippins and Cheese 

We call this subsection Pippins and Cheese, a phrase borrowed from Shake- 
speare’s Merry Wives of Windsor, which describes delicious desserts. 

Here are five problems. Solve them now; they are not difficult, but the an- 
swers may surprise you. 

(i) A triangle has vertices (—18, 49), (15, —7), and (30, —15). How long are 
its sides? 

(ii) In Figure 3.11, the side lengths of AQSU are as marked. How big is 

Z0? 


U 



s 


Figure 3.1 1 . Side lengths. 

(iii) An open box is formed by cutting out squares from a 7 x 15 rectangle and 
folding up the sides (see Figure 3.12). What size cut-out x maximizes the 
volume of the box? 



Figure 3.12. Making boxes. 


(iv) Find the zeros, extrema, and inflection points of the function 

fix) = 140- 144x + 3x 2 + x 3 . 

(v) Find the area of the triangle with sides of lengths 13, 14, and 15. 

A meta-problem is a problem that asks how to design “nice” exercises of 
a particular genre, such as “How do you construct integer-sided scalene trian- 
gles having a 60° angle?” As we mentioned earlier, finding Pythagorean triples 
was one of the first meta-problems; it was invented by teachers who wanted to 
study and apply side-lengths of right triangles. In Section 1.2, we developed 
the method of Diophantus for this purpose — rational points on the unit circle 
correspond to Pythagorean triples. In this section, we’ll consider two types 
of meta-problems: two ways of creating exercises like the five listed above. 

.79 — 1 .82 One meta-problem uses the norm function; the other generalizes Diophantus’s 
chord method of “sweeping lines” by replacing circles with other conic sec- 
tions. (There are many other kinds of meta-problems, ranging in topic from 
exponential equations to algebra word problems to trigonometry.) 
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Gaussian Integers: Pythagorean Triples Revisited 

In Chapter 1, we saw that the Pythagorean equation a 2 + b 2 = c 2 corresponds 
to a rational point (f , f ) on the unit circle. But, given what you’ve just been 
studying, the Pythagorean equation might conjure up another rewrite in your 
mind, namely 


{a + bi)(a — bi) = c 2 , 


or even 


N (a + bi) = c 2 . 

So, we’re looking for complex numbers z = a + bi whose norms are perfect 
squares of integers. The Pythagorean equation now looks like 

N(z) = c 2 . 

For example, N{ 3 + 4 i ) = 5 2 , N(5 + 12 i) = 13 2 , and N( 8 + 15/) = 17 2 . 

This idea doesn’t work for every complex number. What’s needed are com- 
plex numbers whose real and imaginary parts are integers (and, besides, whose 
norms are perfect squares). We’d like the real and imaginary parts to be posi- 
tive integers, but any integers will do, because changing the sign of the real or 
imaginary part of a complex number doesn’t change its norm (why?). 

Definition. The Gaussian integers is the set Z[i] of all complex numbers 
whose real and imaginary parts are integers. In symbols, 

Z [/] = {a + bi e C : a e Z and b e Z} . 

Proposition 3.37. (i) The set Z [/] of Gaussian integers is closed under ad- 

dition and multiplication: If a + bi , c + di e Z [/], then 

(a + bi) + (c + di) = (a + c) + (b + d)i e Z[i] 

(a + bi)(c + di) = ( ac — bcl) + {ad + bc)i e Z [/]. 

(ii) If z = a + bi, then 


N{z) = a 2 + b 2 . 

Proof The formula for addition is clear; for multiplication, use the fact that 
i 2 = —1. Of course, part (ii) is just the definition of the norm. ■ 

We’ll investigate the Gaussian integers in more detail in Chapter 8. 

Let’s return to the norm equation N(z) = c 2 arising from Pythagorean 
triples, but with z a Gaussian integer. Our question is now “Which Gaussian 
integers have perfect squares as norms?” The answer comes from Proposi- 
tion 3.35(iii): if z and w are complex numbers, then N{zw) = N{z) N(w). In 
particular (letting z = w), 


N (z 2 ) = N(z) 2 . 

The left-hand side of this equation is the norm of a Gaussian integer: if z = 
a + ib , then z 2 = (a 2 —b 2 ) + i2ab; moreover, N(z 2 ) is a sum of two nonzero 
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perfect squares if a > 0, b > 0, and a ^ b. Now the right-hand side is the 
square of an integer, namely, N(z) 2 , which produces a Pythagorean triple. For 
example, if z = 3 + 2/, then N(z) = 13 and z 2 = 5 + 12 /, and we get the 
Pythagorean triple (5, 12, 13), for 

5 2 + 12 2 = N ((3 + 2/ ) 2 ) = N( 3 + 2 if = 13 2 . 

We now have a quick way to generate Pythagorean triples (by hand or with 
a computer; one of our colleagues uses this method to amaze friends at parties). 
Pick a Gaussian integer r + si (with r > 0, s > 0, and r ^ s ), and square it. 

The r, s entry in the following table is [(r + is) 2 , N(r + is)]. For exam- 
ple, the top entry in the first column, arising from r = 2 and s = 1, is 
[(2 + /) 2 , N(2 + /)] = [3 + 4 i, 5]; the corresponding Pythagorean triple is 
(3,4,5). 



5 = l 

s = 2 

s = 3 

s = 4 

r = 2 

3 + 4/, 5 




r = 3 

8 + 6/, 10 

5+ 12/, 13 



r = 4 

15 + 8/, 17 

12+ 16/, 20 

7 + 24/, 25 


r = 5 

24+ 10/, 26 

21 + 20/, 29 

16+30/, 34 

9 + 40/, 41 

VO 

II 

35+ 12/, 37 

32 + 24/, 40 

27+ 36/, 45 

20+ 48/, 52 


Eisenstein Integers. 

Let’s now look at the meta-problem of creating triangles with integer side- 
lengths and a 60° angle. 

Let ZC = 60° in Figure 3.13, so that cos(ZC) = j. By the Law of 
Cosines, 


Eisenstein did extensive 
research on complex 
numbers of the form 
a + £>£, where £ is a 
primitive nth root of unity. 
Note that to is a primitive 
cube root of unity. 


c 2 = a 2 + b 2 — lab cos ZC 
= a 2 + b 2 — ab. 

What’s important here is that the right-hand side of the equation, a 2 —ab + b 2 , 
is the norm of a + ba> , where co = ^(—1 + / V3) is a primitive cube root of 
unity (Exercise 3.72 on page 128). This leads to the following definition. 


B 



Definition. The Eisenstein integers is the set Z[a)] of all complex numbers of 
the form a + bee, where co = j(— 1 + / V3) is a primitive cube root of unity 
and a . b are integers. In symbols, 

Z [co] = {a + boo e C : a e Z and b e Z} . 
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lnZ[i], i 2 = —1. In 
Z[a>], co 2 = — 1 —co. 

In Z[i], i 2 = —1. 

In Z[a>], a> 2 = — 1 — co. 

Proof, (i) The formula for addition is clear. For multiplication, 

(a + bm)(c + dm) = ac + (be + ad)m + bdm 2 . 

Since m 2 + m + 1 = 0, we have w 2 = — l — to, and 

(a + bm)(c + dm) = (ac — bd) + (be + ad — bd)m. 

(ii) As we said above, this is Exercise 3.72 on page 128. ■ 

Definition. An Eisenstein triple is a triple of positive integers (a, b, c) such 
that 


Here are some properties of Eisenstein integers. 

Proposition 3.38. (i) The set Z[m\ of Eisenstein integers is closed under ad- 

dition and multiplication: If (a + bw), (c + dm) G Z[m], then 

(a + bm) + (c + dm) = (a + c) + (b + d)m e Z[m] 

(a + bm)(c + dm) = (ac — bd) + (be + ad — bd)m e Z[m\. 

(ii) If z = a + bm, then 

N(z) = a 2 — ab + b 2 . 


a 2 — ab + b 2 = c 2 . 

The same idea that produces Pythagorean triples from norms of squares of 
Gaussian integers applies to produce Eisenstein triples from norms of squares 
of Eisenstein integers. If z is an Eisenstein integer, then 

N(z 2 ) = N(z) 2 . 

The left-hand side of this equation, being the norm of an Eisenstein integer, 
is of the form a 2 — ab + b 2 . And the right-hand side is the square of the 
integer N(z). Hence a 2 — ab + b 2 is a perfect square, and we have produced 
an Eisenstein triple. 

Example 3.39. If z = 3 + 2m, then N(z) = 3 2 — 3 • 2 + 2 2 = 7, and we have 

z 2 = 9+ 12m + 4m 2 
= 9+ \2m + 4(— 1 — m) 

= 5 + 8 m. 

Hence, 5 2 — 5 • 8 + 8 2 = N (z 2 ) = N(z ) 2 = l 2 , and (5, 8, 7) is an Eisenstein 
triple. ▲ 

In Figure 3.14, we have ZQ = 60°. 

We have found a quick way to generate Eisenstein triples (by hand or with 
a computer). Pick an Eisenstein integer r + sm (with r > 0, s > 0, and r f s) 
and square it. 

The r, s entry in the following table is (r + sm) 2 , N(r + sm). For exam- 
ple, the top entry in the first column, which arises from r = 2 and .s' = 1 , 
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U 



Figure 3.14. All sides have integer length. 


is {(2 + co) 2 ,N(2 + &>)) = (3 + 3ft), 3); the corresponding Eisenstein triple 
gives (3, 3, 3), which is an equilateral triangle. One of our friends calls this 
table a “candy store of patterns.” Which entries give equilateral triangles? 



5 = 1 

s = 2 

.v = 3 

s = 4 

r = 2 

3 + 3ft), 3 




r = 3 

8 -K 5co, 1 

5 + 8&), 7 



r = 4 

15 + 7co, 13 

12+ 12 ft), 12 

7+ 15ft), 13 


r = 5 

24 + 9(0, 21 

21 + 16 ft), 19 

16 + 21ft., 19 

9+24 ft., 21 

r = 6 

35+ lift), 31 

32 + 20 ft), 28 

27 + 27ft), 27 

20 + 32ft), 28 

r = 7 

48 + 13 ft), 43 

45 + 24 ft), 39 

40+ 33co, 37 

33 + 40fti, 37 

r = 8 

63+ 15ft>, 57 

60 + 28 co, 52 

55 + 39ft., 49 

48 + 48 ft), 48 

r = 9 

80+ 17ft>, 73 

77 + 32ft), 67 

72 + 45ft), 63 

65 + 56ft), 61 

r = 10 

99+ 19 ft), 91 

96 + 36 ft), 84 

91 + 51ft), 79 

84+ 64 «, 76 


Eisenstein Triples and Diophantus 

There’s another, geometric, way to generate Eisenstein triples, using the same 
idea as the method of Diophantus in Chapter 1. If (a,b, c) is an Eisenstein 
See Exercise 1.79 on triple, so that 

page 44. 

a 2 — ab + b 2 = c 2 , 


then dividing by c 2 gives 

( a/c ) 2 - (a/c) ( b/c ) + ( b/c ) 2 = 1. 

Thus, (a/c, b/c) is a rational point on the ellipse with equation 

x 2 — xy + y 2 = 1 . 

(See Figure 3.15.) As with the unit circle, the graph contains (—1, 0), and we 
See Exercise 3.66 on can use the chord method idea of Diophantus. 

page 128. 

Proposition 3.40. Let l be a line through (—1, 0) which intersects the ellipse 
with equation x 2 — xy + y 2 = 1 in a point P . If l has rational slope, then P 
has rational coordinates, P = (a/c, b/c), and 

a 2 — ab + b 2 = c 2 . 


If P = (a/c, b/c) is in the first quadrant, then (a, b, c) is an Eisenstein triple. 
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y 



Proof. The proof is almost identical to the proof of Proposition 1.2. We leave 
it to you to fill in the details. ■ 

For example, if l has slope j and equation y = j(x + 1), then t intersects 
the ellipse in ( j|, ^), and (15, 7, 13) is an Eisenstein triple. So, the triangle 
whose side lengths are 15, 7, and 13 has a 60° angle. Which angle is it? 

Nice Boxes 

Our next application is to a “box problem.” In an a x b rectangle, cut out little 
squares at the corners, and then fold up the sides to form an open-top box (see 
Figure 3.16). What size cut-out maximizes the volume of the box? For most 
rectangles, the best cut-out has irrational side length. The meta-problem: 

How can we find a and b to make the optimal cut-out a rational number? 



Figure 3.16. Box problem. 

As we tell our students, let the size of the cut-out be x. Then the volume of 
the box is a function of x: 

V(x) = (a — 2 x)(b — 2x)x = 4x 3 — 2 (a + b)x 2 + abx, 
and its derivative is 

V'(x ) = 12.v 2 — 4 (a + b)x + ab. 

We want V'(x ) to have rational zeros, and so its discriminant 
16(o + b) 2 — 48ab 
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should be a perfect square. But 16 is a perfect square, and so 
(a + b) 2 — 3 ab = a 2 — ab + b 2 

should be a perfect square. This will be so if a and b are the legs of an Eisen- 
stein triple (a, b,c). 

For example, from the Eisenstein triple (7, 15, 13), we get a 7 x 15 rectangle 
that can be used to create a box whose maximum volume occurs at a rational- 
length cut-out. The volume of the resulting box is 

V(x) = (7 - 2x)(15 - 2x)x = 4x 3 - 443x 2 + 105x. 

So, V'(x ) = 12x 2 — 88x + 105. The roots of V'(x) are | and Both are 
rational, but only | fits the context and maximizes V . (Why doesn’t fit the 
context? What significance does it have? Also, see Exercise 3.69 on page 128.) 

Nice Functions for Calculus Problems 

Our next meta-problem is one that has occupied faculty room discussions about 
calculus teaching for years. 

How do you find cubic polynomials / (x) with integer coefficients and 
rational roots, whose extrema and inflection points have rational coordi- 
nates? 

No cheating: we want the extrema points and inflection points to be distinct. 
We’ll actually create cubics in which all these points have integer coordinates. 


Using the notation of Theorem 3.3, we can first assume that the cubic / is 
reduced; that is, it has form 

/ (x) = x 3 + qx + r. 

This immediately guarantees that f"(x) = 6x has an integer root, namely 0 
(the inflection point of the graph is on the y-axis). Next, if we replace q by 
—3 p 2 for some integer p, then f'(x) = 3x 2 — 3 p 2 , and fix) also has integer 
roots. So, our cubic now looks like /(x) = x 3 — 3 p 2 x + r. This will have 
rational extrema and inflection points (what are they?), so all we have to do is 
ensure that it has three rational roots. 

If / has two rational roots, it has three (why?), and so it’s enough to make 
two roots, say —a and ft, rational (we use —a instead of a because we’ve 
experimented a bit and found that this makes the calculations come out nicer). 
But if f (—a) = / (/)) = 0, we have 

-a 3 + 3 p 2 a = ft 3 -3 p 2 fi 


or 


P 3 +a 3 = 3 p 2 (a + p). 

We can divide both sides by a + P, for a + /l / 0 (lest —a = /); remember 
that we want our roots distinct); we obtain 

a 2 - ap + P 2 = 3 p 2 . 
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Eisenstein integers again. This is the same as 

N(a + poo) = 3p 2 . 

This time we want an Eisenstein integer whose norm is 3 times a square. 

We’re in luck: the equation a 2 — ab + b 2 = 3 has several integer solutions, 
including (1, 2). So, 3 = 7V(1 — co). Hence we just need to take a + fico to be 
1 — &) times the square of an Eisenstein integer. Indeed, if 

a + Pco = (1 — co)(r + sco) 2 , 

then 

a 2 — otfi + p 2 = N{a + Poo) = N ((1 — co)(r + sco) 2 ) 

= N (1 — co) N ((r + sa>) 2 ) 

= 3 N(r + so) 2 , 

which is 3 times the square of an integer. 

Example 3.41. Let’s take s = 1 and r = 3. Then we have 

a + Pa> = (1 — ft>)(3 + co) 2 
= 13 + 2 co. 


This tells us several things: 

(i) Since JV(13 + 2 co) = 147, our cubic is 

fix) = x 3 - 147x + r. 

(ii) But because 147 = 3 • 7 2 (so, p = 7), f'(x ) = 3x 3 -3-49 will have 
rational roots: ±7. 

(iii) Since two roots of our cubic are —a and /5, two roots are —13 and 2. 

This lets us find r. Since 

2 3 - 147-2 + r = 0, 
we have r = 286. Hence our cubic is 

fix) = x 3 - Ulx + 286. 

You can check that the third root is 1 1 and that the extrema and inflection 
points are rational. ▲ 

Creating examples like this is not hard by hand, but a computer algebra 
system makes it automatic. The next table was generated by a CAS, and it 
shows the results of our algorithm for small values of r and s . 



s = 1 

s = 2 

s = 3 


54 - 21 x + .v 3 



r = 3 








r = 5 




r = 6 
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See Exercise 3.70 on 
page 128. 


A lattice point is a point 
with integer coordinates. 


See Exercise 1 .25 on 
page 14. 


All these cubics have coefficient of x 2 equal to 0. If you’d like examples 
where this is not the case, just replace x by, say, x + 1 and simplify. Again, a 
CAS makes this easy. 

Lattice Point Triangles 

Our last meta-problem arises when illustrating the distance formula. 

How can you find three lattice points A, B , and C in the plane so that 
the distance between any two of them is an integer? 

Clearly, solutions are invariant under translation by a lattice point; that is, 
if A, B , and C form a lattice point solution and U is any lattice point, then 
A — U , B — U , and C — U form another solution: since d(A — {/, B — U) = 
d(A, B) (where d(P, Q) is the distance between points P and Q), we have 
d(A, B) = | A — B\. Hence, we can assume that one of the points, say C, is at 
the origin. 

Now view the plane as the complex plane, so that lattice points are Gaussian 
integers. Thus, we want Gaussian integers z and w such that |z|, |u>|, and 
|z — u; | are integers. But if z = a + bi , then 

|z| = \/ a 2 + b 2 = y/ N(z). 

Hence, to make the length an integer, make the norm a perfect square and, to 
make the norm a perfect square, make the Gaussian integer a perfect square in 
Z [/]. That is, we want Gaussian integers z and w so that z, w, and z — w are 
perfect squares in Z[i], Hence, we choose z and w so that 

z = a 2 for some a e Z[i] 

w = /3 2 for some /I e Z [z] 

z — w = y 2 for some y e Z [/]. 

In other words, we want Gaussian integers a, J3, and y so that 

a 2 - ft 2 = y 2 
or 

a 2 = /3 2 + y 2 . 

The punchline is that one of our favorite identities, 

(x 2 + y 2 ) 2 = (x 2 - y 2 f + (2xy) 2 , 

which holds in any commutative ring, holds, in particular, in Z [/' ] . So, the trick 
is to pick Gaussian integers x and y, set 

9 2 

a = x + y 
p = x 2 -y 2 , 


and then let 


w = P 2 . 
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Example 3.42. Pick x = 2 + i and y = 3 + 2 / . Then 

a = x 2 + y 2 = 8 + 16 / and = x 2 — y 2 = — 2 — 8/. 

Now put 

z = a 2 = — 192 + 256/ and w = ft 2 = —60 + 32/. 

Hence, (0, 0), (—192, 256), and (—60, 32) are vertices of an integer-sided tri- 
angle. Moreover, adding a lattice point to each vertex produces another such 
triangle with no vertex at the origin. Once again, a CAS can be used to generate 
many more. ▲ 

This is just the beginning; many research problems are generalizations of 
meta-problems. Fermat’s Last Theorem started as a search for integer solutions 
to equations like the Pythagorean equation but with larger exponents. 

There are many other meta-problems that yield to these two methods: norms 
from Z [/] or Z [&>]; rational points on the unit circle or on the graph of x 2 — 
xy + y 2 = 1 . Still others can be solved with norms from other number systems 
or from rational points on other curves. In Chapter 9, we will see that congruent 
numbers lead to rational points on certain cubic curves. 


Exercises 

3.60 For each integer n between 3 and 9, find a polynomial of smallest degree with 
integer coefficients whose roots are the primitive nth roots of unity. 

3.61 * Let a and b be real numbers, and let z be a complex number. 

(i) Show that a + bz = a + b z. 

(ii) Show that N(a + bz) = a 2 + 291 (z)ab + b 2 N(z). 

3.62 * If z and w are complex numbers, show that N(z) < N(w) if and only if 
|z| < |u;|. 

3.63 Let A be an isosceles triangle with side lengths 13, 13, and 10. 

(i) Show that the altitude to the base has length 12, and that it divides A into two 
5, 12, 13 triangles. 

(ii) Show that the altitude to one of the sides of length 13 divides A into two right 
triangles whose side lengths are rational. 

(iii) Each of the side lengths can thus be scaled to get a Pythagorean triple. Show 
that one triple is similar to (5, 12. 13) and that other comes from (5 + 12/ ) 2 . 

(iv) Generalize this result to any isosceles triangle formed by two copies of a 
Pythagorean triple, joined along a leg. 

3.64 * Let g and h be complex numbers such that g 3 = h 3 . 

(i) Show that ~g is equal to either h,toh , or to 2 h , where to = j(—l + i V3) . 

(ii) If gh is also real, show that g = h. 

3.65 * Suppose that g = ^ , h = ~g, and to = cos(^jl-) + i sin(Sjl-) (see 

Example 3.36). Find the value of 

(i) gto + h to 2 

(ii) g(D 2 + h to 
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Replacing x by x + 1 just 
translates the graph by one 
unit. Which way? 


3.66 * Sketch the graph of x 2 — ax + y 2 = 1 for 

(i) a = — 1 

(ii) a = 1 

(iii) a = 2 

(iv) a = 3 

(v) a = 2 

3.67 In Theorem 1 .5, we saw that every Pythagorean triple is similar to one of the form 

(2 xy, x 2 - v 2 , x 2 + v 2 ). 

Show how this can be obtained via the “norm from Z [(]” method. 

3.68 Obtain a formula for Eisenstein triples analgous to the one for Gaussian inte- 
gers in Theorem 1.5 using norms from Z[a>] and rational points on the graph of 
x 2 — xy + y 2 = 1 . 

3.69 * Assume that the square of the Eisenstein integer r + sa> is used to generate an 
Eisenstein triple, and that the triple is used to create a “nice box,” as on page 123. 
Express the volume of the box in terms of r and s. 

3.70 * Replace x by x + 1 in several of the cubics in the table on page 125 to produce 
nice cubics whose coefficient of x 2 is nonzero. Show that your cubics are indeed 
nice. 

3.71 Describe where Gaussian integers are situated in the complex plane. 

3.72 * Suppose that a and b are real numbers and 

a) = cos(^j-) + i sin(-2jl) = j(— 1 + i \/3). 

Show that 

N(a + ba>) = a 2 — ab + b 2 . 

3.73 Describe where Eisenstein integers are situated in the complex plane. 

3.74 Find an integer-sided triangle one of whose angles has cosine equal to 3/5. 

Hint. Let p = cos(^) + i sin(^) and consider norms from Z[p]. What conic 
would help here? 

3.75 A Heron triangle is a triangle with integer side lengths and integer area. In Ex- 
ercise 1.26 on page 14, you found a Heron triangle by joining two Pythagorean 
triangles together along a common leg. Show that the following method also pro- 
duces Heron triangles. 

Pick a rational point (cos 9, sin 9) on the unit circle, where 0 < 9 < n, and let 
a = — cos 9 + / sin 9. Then pick any number z of the form r + sa, where r and 
j are rational numbers and r > s > 0. 

(i) What is the norm of r + .to? 

(ii) Show that 

a 2 + 2a cos 9 +1 = 0. 

(iii) Show that if : 2 = a + ba, then the triangle with side lengths a and b and 
included angle 9 will have a rational number, say c, as its third side length 
and a rational number as an area. (This triangle can be then scaled to produce 
a Heron triangle.) Use this method to generate a few Heron triangles. 

3.76 Show that a triangle with lattice point vertices and integer side-lengths is a Heron 
triangle. 
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3.77 Take It Further. Here’s a typical current problem, taken from B. Kerins, Gauss, 
Pythagoras, and Heron, Mathematics Teacher, 2003, 350-357: 

A boat is making a round trip, 135 miles in each direction. Without a cur- 
rent, the boat’s speed would be 32 miles per hour. However, there is a con- 
stant current that increases the boat’s speed in one direction and decreases 
it in the other. If the round trip takes exactly 9 hours, what is the speed of 
the current? 

(i) Solve the problem. 

(ii) Solve the corresponding meta-problem: find a method for generating current 
problems that come out nice. 





Modular Arithmetic 



Theorems about integers can be generalized to other interesting contexts. For 
example, an early attack on Fermat’s Last Theorem was to factor x n + y" 
(n odd) as in Exercise 3.50 on page 115: 

x n + y n = {x + y)(x + ■ ■ ■ (x + t” -1 y), 


where £ = e 2nl ^ n is an nth root of unity. It turns out that the most fruitful 
way to understand this factorization is within the system Z[£] of cyclotomic 
integers, the collection of all polynomials in £ with coefficients in Z (a com- 
mon generalization of Gaussian integers Z[i] and Eisenstein integers Z[w\). 
Numbers in these systems can be added and multiplied, and they satisfy all but 
one of the nine fundamental properties that ordinary numbers do (reciprocals 
of cyclotomic integers need not be such); we will call such systems commuta- 
tive rings. But for some roots of unity £, the commutative ring Z[£] does not 
enjoy the unique factorization property that Z, Z[z], and Z[co\ have, and this 
caused early “proofs” of Fermat’s Last Theorem to be false. Dealing with the 
lack of unique factorization was one important problem that led naturally to 
the modern way of studying algebra. 

In Section 4.1, we shall see that the distinction between even and odd can 
be generalized, using congruences: studying remainders in the Division Algo- 
rithm. It turns out, as we’ll see in Section 4.3, that, for any fixed positive integer 

m, the set of its remainders, 0, 1 in — 1, can be viewed as a commutative 

ring, as can cyclotomic integers, and they behave in many, but not all, ways 
as do ordinary integers. Finally, in Section 4.5, we’ll apply these results to an 
analysis of decimal expansions of rational numbers. 


Weil discuss cyclotomic 
integers in Chapter 8. 


It turns out that many of 
the “number systems” 
studied in high school are 
commutative rings. 


4.1 Congruence 

It is often useful to know the parity of an integer n ; that is, whether n is even 
or odd (why else would these words be in the language?). But n being even or 
odd is equivalent to whether its remainder after dividing by 2 is 0 or 1 . Modular 
arithmetic, introduced by Euler around 1750, studies the generalization of par- 
ity arising from considering remainders after dividing by any positive integer. 
At a low level, it will help us answer questions of the following sort: 

• London time is 6 hours ahead of Chicago time; if it is now 9:00 AM in 
Chicago, what time is it in London? 

• If April 12 falls on a Thursday this year, on what day of the week is May 26? 
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Note that a = 0 mod m if 
and only if m \ a. 


Modular arithmetic is called 
“clock arithmetic” in some 
introductory texts. 


At a more sophisticated level, it will allow us to solve some difficult number 
theoretic problems. 


Definition. Let m > 0 be an integer, if a, h e Z, then a is congruent to b 
modulo m, denoted by 

a = b mod m , 


if m | ( a — b). 


Etymology. The number m in the expression a = b mod m is called the 
modulus, the Latin word meaning a standard unit of measurement. The term 
modular unit is used today in architecture: a fixed length m is chosen, say, 
m = 1 foot, and plans are drawn so that the dimensions of every window, 
door, wall, etc., are integral multiples of m. 


We claim that integers a and b have the same parity if and only if a = 
b mod 2. Assume that a and b have the same parity. If both are even, then 
a = 2a' and b = 2b'. Hence, a — b = 2{a' — b'), 2 \ (a — b ), and a = 
b mod 2. Similarly, if both are odd, then a = 2a' + 1 and b = 2b' + 1. Hence, 
a — b = (2 a' + 1) — (2b' + 1) = 2(a' — b'), 2 \ (a — b), and a = b mod 2 
in this case as well. Conversely, suppose that a = b mod 2. If a and b have 
different parity, then one is even, the other is odd, and so their difference is 
odd. Hence, 2 \ (a — b), and a b mod 2. Having proved the contrapositive, 
we may now assert that a and b have the same parity. 

Example 4.1. If a = r mod m, then r is obtained from a by throwing out 
a multiple of m. For example, let’s compute the time of day using a 12-hour 
clock. When adding 6 hours to 9:00, the answer, 3:00, is obtained by taking 
9 + 6=15 = 3 mod 12 (i.e., we throw away 12). In more detail, let 0 denote 
12:00, 1 denote 1:00, . . . , 11 denote 11:00. Three hours after 9:00 is 12:00; 
that is, 9 + 3 = 12 = 0 mod 12; 4 hours after 9:00 is 1:00; that is, 9 + 4 = 
13= 1 mod 12, and 6 hours after 9:00 is 3:00; that is, 9+6 = 15 = 3 mod 12. 

The same idea applies to calendars. Let 0 denote Sunday, 1 denote Mon- 
day, .... 6 denote Saturday. 


Sun 

Mon 

Tues 

Wed 

Thurs 

Fri 

Sat 

0 

1 

2 

3 

4 

5 

6 


If today is Tuesday, what day of the week is 90 days from now? Since 2 + 90 = 
92 = 1 mod 7, the answer is Monday. 

Let’s now answer the question: if April 12 falls on Thursday this year, on 
what day of the week is May 26? There are 18 days to April 30, so there are 
18 + 26 = 44 days until May 26 (for April has only 30 days). Now Thursday 
corresponds to 4, so that May 26 corresponds to 4 + 44 = 48 = 6 mod 7; 
therefore. May 26 falls on Saturday. ▲ 

There are at least two ways to state the solutions of Exercises 3.2 and 3.3 
on page 89. We expected you to say then that i n = i m if and only if u and m 
leave the same remainder when divided by 4 and, if w is a primitive cube root 
of unity, that w n = o> m if and only if n and m leave the same remainder when 
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divided by 3. In light of the next proposition, we can also say that /" = i m if 
and only if 4 | (n — m); that is, n = m mod 4, and co n = w m if and only if 
3 | (n — m); that is, n = m mod 3. 

Proposition 4.2. Let m > 2 and a,b e Z. 

(i) If a = qm + r, then a = r mod m. 

(ii) a = b mod m if and only if each of a and b have the same remainder 
after dividing by m. 

Proof, (i) Since a — r = qm, we have m | ( a — r); that is, a = /' mod m. 

(ii) Assume a = b mod m. Let r, r' be the remainders after dividing a, b, re- 
spectively, by m; that is, a = qm + r and b = q'm + r ' , where 0 < r < rn 
and 0 <r'<m. We want to show that r' = r. If not, suppose that r' <r 
(the argument is the same if r < r'). Then a — b = m(q — q') + (r — r') 
with 0 < r — r' < m. Now Exercise 1.46 on page 29 gives m \ (r — r'). 
Hence, m < r — r' , by Lemma 1.13, contradicting r — r' < m. 

Conversely, if a = qm + r,b = q'm + r' , and r = r' , then a — b = 
m(q — q') and a = b mod m. ■ 

Notice that Proposition 4.2 generalizes the fact that integers a and b have 
the same parity if and only if a = b mod 2. 

We are now going to see that congruence modulo m behaves very much 
like ordinary equality; more precisely, it is an equivalence relation (see Ap- 
pendix A.2): it is reflexive, symmetric, and transitive. 

Proposition 4.3. Let m > 0. For all integers a,b,c, we have 

(i) a = a mod m; 

(ii) if a = b mod m, then b = a mod m; 

(iii) if a = b mod m and b = c mod m, then a = c mod m. 

Proof. All are easy to check. We have a = a mod m, because a — a = 0 and 
m | 0 is always true (even when m = 0). Since /?— a = —{a—b),ifm \ ( a—b ), 
then m \ (b — a). Finally, (a — b) — (b — c) = a — c, so that if m \ (a — b) and 
m \ (b — c), then m \ (a — c). ■ 


How to Think About It. Congruence mod 1 makes sense, but it is not very 
interesting, for a = b mod 1 if and only if 1 | {a — b). But this latter condition 
is always true, for 1 is a divisor of every integer. Thus, every two integers are 
congruent mod 1 . Similarly, congruence mod 0 makes sense, but it, too, is not 
very interesting, for 0 | c if and only if c — 0. Thus, a = b mod 0 if and only 
if 0 | {a — b)\ that is, a = b mod 0 if and only if a = b, and so congruence 
mod 0 is just ordinary equality. You should not be surprised that we usually 
assume that m > 2. 


Corollary 4.4. If m > 2, then every integer a is congruent mod m to exactly 
one integer on the list 


If 0 | c, then there is some 
k with c = 0 • k = 0; that 
is, c = 0. 


See Exercise 4.5 on 
page 140 for a generaliza- 
tion. 
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Proof. By the Division Algorithm, we have a = qm + r, where 0 < r < m: 
that is, a = r mod m. 

If a were congruent to two integers on the list, say, r < r ' , then r = 
r' mod m by transitivity, so that m \ (r 1 — r ). Since 0 < r' — r < m , this 
would contradict Lemma 1.13. ■ 

Congruence gets along well with addition and multiplication. 

Proposition 4.5. Let m > 0. 

(i) If a = a' mod m and b = b' mod m, then 

a + b = a' + b' mod m . 

More generally , if at = a\ mod m for i = 1 , ,k, then 
a i + H a/c = a\ + h a' k mod m. 

(ii) If a = a' mod m and b = b' mod m, then 

ab = a'b' mod m. 

More generally, if at = a\ mod m for i = 1 , ... ,k, then 
ai-'-a/c = a\ ■■■ a' k mod m. 

(iii) If a = b mod m, then 

a k = b k mod m for all k > 1 . 

Proof, (i) If m \ (a — a') and m \ ( b — b'), then m \ (a + b) — ( a ' + b'), 
because ( a + b) — ( a ' + b') = ( a — a') + (b — b'). The generalization to 
k summands follows by induction on k > 2. 

(ii) We must show that if m \ ( a — a') and m \ (b — b'), then m \ (ab — a'b'). 
This follows from the identity 

ab — a'b' = ab — ab' + ab' — a'b' = a(b — b') + (a — a')b' . 

The generalization to k factors follows by induction on k > 2. 

(iii) This is the special case of part (ii) in which all a\ = a and all a\ = b. ■ 


How to Think About It. The key idea in calculating with congruences 
mod m is that every number can be replaced by its remainder after dividing 
by m, for this is precisely what Proposition 4.5 permits; it allows you to “re- 
duce as you go” in calculations, as the next example shows. 


Example 4.6. The last (units) digit of a positive integer is the remainder when 
it is divided by 10. What is the last digit of 

10324 3 + 2348 • 5267? 

We could do this by brute force: cube 10324, multiply 2348 and 5267, add, and 
look at the last digit. But, as one of our friends says, why should the calculator 
have all the fun? You can do this more cleverly using congruence. 
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• To compute 10324 3 , first look at 10324. 

10324 = 4 mod 10, so that 10324 3 = 4 3 mod 10. 

Now 4 3 = 64 = 4 mod 10, so that 10324 3 = 4 mod 10, and the last digit 
of 10324 3 is 4. 

More simply, think of 
multiplying 2348 and 5267 
by hand. What’s the last 
digit? This is what most 
middle school students 
would do. We just want 
to illustrate the general 
principle here. 

and 0 is the last digit; 10324 3 + 2348 • 5267 is divisible by 10. 

Now you try one: what is the last digit of 75284 3 + 10988-310767? ▲ 


To multiply 2348 and 5267, note that 2348 = 8 mod 10 and 5267 = 
7 mod 10. Hence, 


2348 • 5267 = 8 • 7 = 56 = 6 mod 10. 


Thus, 


10324 3 + 2348 • 5267 = 4+ 6 = 10 mod 10, 


The next example uses congruence to solve more difficult problems. 

Example 4.7. (i) If a € Z, then a 2 = 0, 1, or 4 mod 8. 

If a is an integer, then a = r mod 8, where 0 < r < 7; moreover, 
by Proposition 4.5 (iii), a 2 = r 2 mod 8, and so it suffices to look at the 
squares of the remainders. We see in Figure 4.1 that only 0, 1, or 4 can be 
a remainder after dividing a perfect square by 8. 


r 

0 

1 

2 

3 

4 

5 

6 

7 

r 2 

0 

1 

4 

9 

16 

25 

36 

49 

r 2 mod 8 

0 

1 

4 

1 

0 

1 

4 

1 


Figure 4.1 . Squares mod 8. 


(ii) n = 1003456789 is not a perfect square. 

Since 1000 = 8 • 125, we have 1000 = 0 mod 8, and so 

1003456789 = 1003456 • 1000 + 789 = 789 mod 8. 

Dividing 789 by 8 leaves remainder 5; that is, n = 5 mod 8. But if n were 
a perfect square, then n = 0, 1, or 4 mod 8. 

(iii) There are no perfect squares of the form 3 m + 3” + 1, where m and n are 
positive integers. 

Again, let’s look at remainders mod 8. Now 3 2 = 9 = 1 mod 8, and 
so we can evaluate 3 m mod 8 as follows: if m = 2k, then 3 m = 3 2k = 
9 k = 1 mod 8; if m = 2k + 1, then 3 m = 3 2k+1 = 3 2 * -3 = 3 mod 8. 
Thus, 

1 mod 8 if m is even 
3 mod 8 if m is odd. 

Replacing numbers by their remainders after dividing by 8, we have the 
following possibilities for the remainder of 3 m + 3” + 1, depending on 
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the parities of m and n : 

3 + 1 + 1 = 5 mod 8 
3 + 3 + 1 = 7 mod 8 
1 + 1 + 1 = 3 mod 8 
1 + 3 + 1 = 5 mod 8. 

In no case is the remainder 0, 1, or 4, and so no number of the form 
3 m + 3" + 1 can be a perfect square, by part (i). ▲ 

Many beginning algebra students wish that (a + b) p = a p + b p in Z; if only 
( a + b) 2 = a 2 + b 2 \ The next proposition (which paraphrases Exercise 7.27 
on page 293) would delight them. If theorems were movies, Proposition 4.8 
would be X-rated: only adults would be allowed to see it. 

Proposition 4.8. If p is a prime and a, b are integers, then 
(a + b) p =a p + b p mod p. 

Proof. The Binomial Theorem gives 

p 

(a +b) p = 

r=0 

But ( p ) = 0 mod p for all r with 0 < r < p, by Proposition 2.26. The result 
now follows from Proposition 4.5(i). ■ 

The next theorem (sometimes called the Little Fermat Theorem to distin- 
guish it from Fermat’s Last Theorem) turns out to be very useful. 

See Corollary 4.67 for Theorem 4.9 (Fermat). Let p be a prime and a € Z. 

another proof. 

(i) a p = a mod p. 

(ii) a p " = a mod p for all n > 1. 

(iii) If p \ a, then a p ~ l = 1 mod p. 

Proof, (i) We first prove the statement when a > 0, by induction on a. The 
base step a = 0 is obviously true. For the inductive step, the inductive 
hypothesis is a p = a mod p. Hence, Proposition 4.8 gives 

(a + l) p = a p + 1 = a + 1 mod p. 

To complete the proof, consider —a, where a > 0; now 

(■ -a) p = (-1 ) p a p = (-1 ) p a mod p. 

If p is an odd prime (indeed, if p is odd), then (— 1)^ = — l,and(— \) p a = 
—a, as desired. If p = 2, then (—a) 2 = a 2 = a mod 2, and we are fin- 
ished in this case as well. 

(ii) The proof is by induction on n > 1 : the base step is part (i), while the 
inductive step follows from the identity a p " = (a p " ) p . 

(iii) By part (i), p \ ( a p — a); that is, p \ a{a p ~ l — 1). Since p \ a, Euclid’s 
Lemma gives p \ {a p ~ l — 1); that is, a p ~ l = 1 mod p. ■ 
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Later in this chapter, we will use the next corollary to construct codes that 
are extremely difficult for spies to decode. 

Corollary 4.10. If p is a prime and m = 1 mod (/>— 1), thena m = a mod p 
for all a e Z. 

Proof. If a = 0 mod p, then a m = 0 mod p, and so a m = a mod p. Assume 
now that a ^ 0 mod p\ that is, p \ a. By hypothesis, m — 1 = k(p — 1) for 
some integer/:, and so m = \ + (p — \ )k. Therefore, 

a m = a 1+ ( p ~ 1)k = aa (p - 1)k = a(a p ~ l ) k = a mod p, 
for a p ~ l = 1 mod p, by Theorem 4. 9(iii) ■ 

We can now explain a well-known divisibility test. 

Proposition 4.11. A positive integer a is divisible by 3 if and only if the sum 
of its ( decimal ) digits is divisible by 3. 

Proof. The decimal notation for a is dk . . .d\do', that is, 

a = dk 10 fc -{- •••-{- dilO do, 

where 0 < d/ < 10 for all i. Now 10= 1 mod 3, and Proposition 4.5(iii) 
gives 10' = V = 1 mod 3 for all i ; thus parts (i) and (ii) of Proposition 4.5 

give a = dk + 1- d\ + do mod 3. Therefore, a is divisible by 3 if and only 

if a = 0 mod 3 if and only if dk + ■ ■ ■ + d\ + do = 0 mod 3. ■ 


How to Think About It. The proof of Proposition 4.1 1 shows more than its 
statement claims: the sum of the (decimal) digits of any positive integer a is 
congruent to a mod 3, whether or not a is divisible by 3. For example, 

172= 1+7 + 2 mod 3; 

that is, both 172 and 10 (the sum of its digits) are = 1 mod 3. 


Since 10=1 mod 9, Proposition 4.1 1 holds if we replace 3 by 9 (it is often 
called casting out 9s): A positive integer a is divisible by 9 if and only if the 
sum of its digits, £(a), is divisible by 9. 

Define two operations on the decimal digits of a positive integer a. 

(i) Delete all 9s (if any) and delete any group of digits whose sum is 9 

(ii) Add up all the digits. 

It is easy to see that repeated applications of these operations to a positive 
integer a yields a single digit; call it r (a). For example, 

5261934 — > 526134 — > 561 (for 2 + 3 + 4 = 9) -* 12 3. 

(It is now clear why this procedure is called casting out 9s.) In light of a = 
£(a) mod 9, we have £ (a) = r(a) mod 9, so that r (a), which seems to de- 
pend on a choice of operations (i) and (ii), depends only on a, for the variation 
of Proposition 4.11 for 9 says that £ (a ) is the remainder after dividing a by 9. 


Is the sum of the decimal 
digits of an integer a 
congruent mod 9 to a 
itself? 
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Before today’s calculators, casting out 9s was used by bookkeepers to detect 
errors in calculations (alas, it could not detect all errors). For example, suppose 
the end of a calculation gave the equation 

(22345 + 5261934)1776 = 9347119504. 

Casting out 9s from each number gives 

(7 + 3)3 = 8, 

for r (22345) = 7, r(5261934) = 3, r(1776) = 3, and r(93471 19504) = 8. 
But (7 + 3)3 = 30 = 3 mod 9, not 8 mod 9, and so there was a mistake in the 
calculation. 

The word “bookkeeper” is unusual in that it has three consecutive double 
letters: oo, kk, ee. This reminds us of a silly story about a word having six 
consecutive double letters. A zoo discovered that one of its animals, Ricky 
the raccoon, was quite remarkable. Ricky was a born showman: he could do 
somersaults, hang by his tail, and give wonderful soft-shoe dances whenever 
spectators sang. As his fame spread, the zoo provided him with a special cage 
containing a private corner where he could unwind after popular performances. 
Crowds came from far and wide came to see him. Indeed, Ricky became so 
famous that the zoo was forced to hire attendants to take care of his needs. In 
particular, someone was sought to maintain Ricky’s corner; the job description: 
raccoonnookkeeper. 

The usual decimal notation for the integer 5754 is an abbreviation of 
5- 10 3 + 7- 10 2 + 5- 10+ 4. 

But there is nothing special about the number 10. 

Example 4.12. Let’s write 12345 in “base 7.” Repeated use of the Division 
Algorithm gives 

12345 = 1763-7 + 4 
1763 = 251-7+6 
251 = 35 - 7 + 6 
35 = 5 - 7 + 0 
5 = 0 - 7 + 5. 

Back substituting (i.e., working from the bottom up), 






0 

7 + 

5 = 

= 5 





5 

7 + 

0 = 

= 35 




(O' 

'7+5) 

7 + 

0 = 

= 35 





35 

7 + 

6 = 

= 251 


((O' 

'7 + 

5)' 

■7 + 0) 

7 + 

6 = 

= 251 





251 

7 + 

6 = 

= 1763 

(((O' 

' 7 + 5) ■ 

'7 + 

0). 

'7+6) 

7 + 

6 = 

= 1763 





1763 

7 + 

4 = 

= 12345 

((((0 • 7 + 5) ■ 

■ 7 + 0) • 

'7 + 

6)- 

'7+6) 

7 + 

4 = 

- 12345. 
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Expanding and collecting terms gives 

5-7 4 + 0-7 3 + 6-7 2 + 6- 7 + 4 = 12005 + 0 + 294 + 42 + 4 = 12345. ▲ 


This idea works for any integer b > 2. 

Proposition 4.13. Ifb > 2 is an integer, then every positive integer h has an 
expression in base b: there are unique integers dj with 0 < dj < b such that 

h = dfob^ T dk~\b k ^ + • • • 4- do. 

Proof. We first prove the existence of such an expression, by induction on h. 
By the Division Algorithm, h = qb + r , where 0 < r < b. Since b > 2, we 
have h = qb + r > qb > 2 q. It follows that q < h: otherwise, q > h , giving 
the contradiction h > 2q > 2 h. By the inductive hypothesis, 

h = qb + r = ( d' k b k + 1- d' 0 )b + r = d' k b k+l + ■ ■ ■ + d' 0 b + r. 

We prove uniqueness by induction on h. Suppose that 

h — dfcb T • • • T d\b T do = e^b T • • • T e\ b eo, 

where 0 < ej <b for all j . that is, h = (d\b k 1 + • • • + d\ )b + do and 

h = (e m b m ~ l H h e \ )b + c j q. By the uniqueness of quotient and remainder 

in the Division Algorithm, we have 

dkb k ~ l 4 + d\ = e m b m ~ l + 1- e\ and do = eo- 

The inductive hypothesis gives k = m and d ,■ = e, for all i > 0. ■ 

Definition. If h = dkb k + dk-\b k ~ l + ■ ■ ■ + do, where 0 < dj < b for all i, 
then the numbers dk, ■ ■ ■ ,do are called the b-adic digits of h. 

That every positive integer h has a unique expression in base 2 says that 
there is exactly one way to write h as a sum of distinct powers of 2 (for the 
only binary digits are 0 and 1). 


Example 4.14. Let’s calculate the 13-adic digits of 441. The only complica- 
tion here is that we need 13 digits d (for 0 < d < 13), and so we augment 0 
through 9 with three new symbols 


Now 


t = 10, e = 11, and w = 12. 

441 = 33-13 + 12 
33 = 2 - 13+ 7 
2 = 0- 13 + 2. 


So, 441 = 2 - 1 3 2 + 7 - 13 + 12, and the 13-adic expansion for 441 is 


27 w. 


Example 4.12 shows that 
the 7-adic digits of 12345 
are 50664. 


Note that the expansion for 33 is just 27. ▲ 
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The most popular bases are b = 10 (giving everyday decimal digits), b = 2 
(giving binary digits, useful because a computer can interpret 1 as “on” and 0 
as “off"), and b = 16 ( hexadecimal , also for computers). The Babylonians 
preferred base 60 (giving sexagesimal digits). 

/c 

Fermat’s Theorem enables us to compute n p mod p for every prime p 
and exponent it says that n p = n mod p. We now generalize this result 
to compute n h mod p for any exponent h. 


This lemma generalizes 
Fermat’s Theorem, for if 
h = p k , then Tj{h) = 1; 
see Exercise 4.10 on 
page 141 . 


Example 4.16. What is the remainder after dividing 3 12345 by 7? By Exam- 
ple 4.12, the 7-adic digits of 12345 are 50664. Therefore, 3 12345 = 3 21 mod 7 
(because 5 + 0 + 6 + 6 + 4 = 21). The 7-adic digits of 21 are 30 (because 
21 = 3- 7+0), and so 3 21 = 3 3 mod 7 (because 2 + 1=3). We conclude that 
312345 = 3 3 = 21 = 6 mod 7. ▲ 

Exercises 

4.1 Show that if integers a and b are congruent mod m to the same thing, say r, then 
they are congruent to each other. 

4.2 We saw in Exercise 1 .41 on page 29 that an integer b and its negative —b can have 
different remainders, say r and s , after dividing by some nonzero a. Prove that 
s = —r mod a. 

4.3 Show that if a = b mod n and m \ n , then a = b mod m. 

4.4 Agoogolis 10 100 ; that is, 1 followed by 100 zeros. Compute the remainder mod 7 
of a googol. 

4.5 * 

(i) If m > 2, show that every integer a is 

(ii) congruent mod m to exactly one integer on the list 

1, 2 m. 

(iii) Generalize Corollary 4.4 by showing that if m > 2. every integer a is congru- 
ent mod m to exactly one integer on any list of m consecutive integers. 


Lemma 4.15. Let p be a prime and let n be a positive integer. Ifh > 0, then 

n h = /7 EW mod p, 

where E(/t) is the sum of the p-adic digits of h. 

Proof. Let h = d^ p k + ■■■ + d\p + do be the expression of h in base p. 

By Fermat’s Theorem, n p ‘ = n mod p for all /; thus, n diP ‘ = (n d ') p ‘ = 

n d ‘ mod p. Therefore, 

n h = n dkpk "* — ^ d \P+ d o 

= n dkpk n dk ~ ipk ~ i ••• n dip n d ° 

= (n pk ) dk (n pk ~ l ) dk ~ l ••• (n p ) dl n do 
= n dk n dk ~ l ••• n di n d ° mod p 
= n dk+ '" +d l+do mod p 
= mod p . m 




4.1 Congruence 141 


4.6 (i) Show that every nonnegative integer is congruent mod 6 to the sum of its 

7-adic digits. 

(ii) Show that every nonnegative integer is congruent mod 3 to the sum of its 
7-adic digits. 

(iii) Suppose b and n are nonnegative integers. If n \ ( b — 1), show that every 
integer is congruent mod n to the sum of its A-adic digits. 

4.7 (i) Show that every nonnegative integer is congruent mod 1 1 to the alternating 

sum of its decimal digits. 

(ii) Show that every nonnegative integer is congruent mod b + 1 to the alternating 
sum of its 7>-adic digits. 

4.8 Let a nonnegative integer n have decimal expansion n = Xlf=o 10* • Define 
t(n) = H^--Ad 0 . 

(i) Show that n is divisible by 41 if and only if t ( n ) is. 

(ii) Is n = t(n) mod 41 for all nonnegative n? 

4.9 Find the i-adic digits of 1000 for b = 2,3, 4. 5, and 20. 

4.10 (i) Find the 1 1-adic digits of 1 1 5 . 

(ii) What is the i-adic expansion for b k ( k a nonnegative integer)? 

4.11 Let a be a positive integer, and let a' be obtained from a by rearranging its (dec- 
imal) digits (e.g., a = 12345 and a’ = 52314). Prove that a — a’ is a multiple 
of 9. 

4.12 Prove that there are no positive integers a,b,c with 

a 2 + b 2 + c 2 = 999. 

4.13 Prove that there is no perfect square whose last two decimal digits are 35. 

4.14 Using Fermat’s Theorem 4.9, prove that if a p +b p = c p , thena +b = c mod p. 

Linear congruences 

We are now going to solve linear congruences’, that is, we’ll find all the inte- 
gers x, if any, satisfying 


ax = b mod m. 

Later, we will consider several linear congruences in one unknown with dis- 
tinct moduli (see Theorems 4.21, 4.25, and 4.27). And we’ll even consider two 
linear congruences in more than one unknown (see Theorem 4.44). 

Theorem 4.17. // gcd(«. m) = 1, then, for every integer b, the congruence 

ax = b mod m 

can be solved for x; in fact, x = sb, where as + mt = 1. Moreover, any two 
solutions are congruent mod m. 

Proof Since gcd (a,m) = 1, there are integers ,v and t with as + mt = 1; 
that is, as = 1 mod m. Multiplying both sides by b. Proposition 4.5(ii) gives 
asb = b mod m, so that x = sb is a solution. If y is another solution, then 
ax = ay mod m, and so m \ a(x — y). Since gcd(m, a) = 1, Corollary 1.22 
gives m \ (x — y); that is, x = y mod m. ■ 


You will have to invent 
symbols for some 20-adic 
digits. 
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Theorem 4.27 will gener- 
alize Theorem 4.21 to any 
number of moduli. 


Corollary 4.18. If p is prime and p \ a ( i.e ., p does not divide a), then the 
congruence ax = b mod p is always solvable. 

Proof. Since p is a prime, p \ a implies gcd(a, p) = 1. ■ 

Example 4.19. When gcd(a, m) = 1, Theorem 4.17 says that the set of solu- 
tions of ax = b mod m is 

{ sb + km : where k e Z and sa = 1 mod m}. 

Now sa + tm = 1 for some integer t, so that s can always be found by 
Euclidean Algorithm II. When m is small and you are working by hand, it is 
easier to find such an integer s by trying each of ra = 2a, 3a , ... , (m — \)a in 
turn, at each step checking whether ra = 1 mod m . 

For example, let’s find all the solutions to 

2x = 9 mod 13. 

Considering each of the products 2-2, 3-2, 4-2,... mod 1 3 quickly leads 
to 7 • 2 = 14 = 1 mod 13; that is, s = 7. By Theorem 4.17, x = 7 • 9 = 63 = 
11 mod 13. Therefore, 


x = 11 mod 13, 


and the solutions are ..., —15, —2, 11, 24, 37, ▲ 

Example 4.20. Find all the solutions to 5\x = 10 mod 94. 

Since 94 is large, seeking an integer s with 51s = 1 mod 94, as in Exam- 
ple 4.19, is tedious. Euclidean Algorithm II gives 1 = —35 • 51 + 19 • 94, 
and so s = —35. (The formulas in Exercise 1.67 on page 36 implement Eu- 
clidean Algorithm II, and they can be programmed on a calculator to produce 
the value of s. In fact, a CAS can solve specific congruences, but it can’t (yet) 
solve them in general.) Therefore, the set of solutions consists of all integers x 
with x = —35 • 10 mod 94; that is, all numbers of the form —350 + 94k. 

If you prefer s to be positive, just replace —35 by 59, for 59 = —35 mod 94. 
The solutions are now written as all integers x with x = 59 • 10 mod 94; that 
is, numbers of the form 590 + 94k. A 

There are problems solved in ancient Chinese manuscripts, arising from 
studying calendars, that involve simultaneous congruences with relatively 
prime moduli. 

Theorem 4.21 (Chinese Remainder Theorem). If m and in' are relatively 
prime, then the two congruences 

x = b mod m 
x = b' mod m' 


have a common solution. Moreover, any two solutions are congruent mod 
m m '. 
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Proof. Every solution of the first congruence has the form x = b + km for 
some integer k ; hence, we must find k such that b + km = b' mod m'\ that is, 
km = b' — b mod m ' . Since gcd(m, m') = 1, however. Theorem 4.17 applies 
at once to show that such an integer k does exist. 

If y is another common solution, then both m and in' divide x — y\ by 
Exercise 1.58 on page 35, mm' | (x — y), and so x = y mod mm' . ■ 

Example 4.22. Let’s find all the solutions to the simultaneous congruences 

x = 5 mod 8 
x = 11 mod 15. 

Every solution to the first congruence has the form 

x = 5 + 8 k, 

for some integer k. Substituting, x=5+8fc=ll mod 15, so that 

&lc = 6 mod 15. 

But 2 • 8 = 16 = 1 mod 15, so that multiplying by 2 gives 

16 k = k = 12 mod 15. 

We conclude that x = 5+8-12 = 101 is a solution, and the Chinese Remainder 
Theorem (which applies because 8 and 15 are relatively prime) says that every 
solution has the form 101 + 120/? for;? e Z (because 120 = 8 • 15). ▲ 

Example 4.23. We solve the simultaneous congruences 

x = —6 mod 13 
x = 8 mod 20. 

Now gcd(13, 20) = 1, so that we can solve this system as in the proof of the 
Chinese Remainder Theorem. The first congruence gives 

x = 13 k — 6, 

for k e Z, and substituting into the second congruence gives 

13k — 6 = 8 mod 20; 


that is, 


13 k = 14 mod 20. 

One finds 17 either by try- 
ing each number between 
1 and 19 or by using the 
Euclidean Algorithm. 

By the Chinese Remainder Theorem, all the simultaneous solutions x have the 
form 


Since 13-17 = 221 = 1 mod 20, multiplying by 17 gives k = 17-14 mod 20, 
that is, 

k = 18 mod 20. 


x = 13k - 6 = (13 • 18) - 6 = 228 mod 260; 
that is, the solutions are 


...,-32,228,488 ▲ 
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Remember that 0 denotes 
Sunday, . . 6 denotes 
Saturday. 


Example 4.24 (A Mayan Calendar). A congruence arises whenever there is 
cyclic behavior. For example, suppose we choose some particular Sunday as 
time zero and enumerate all the days according to the time elapsed since then. 
Every date now corresponds to some integer, which is negative if it occurred 
before time zero. Given two dates t\ and t 2 , we ask for the number x = t 2 — t\ 
of days from one to the other. If, for example, t\ falls on a Thursday and t 2 
falls on a Tuesday, then t\ = 4 mod 7 and t 2 = 2 mod 7, and so x = t 2 — t\ = 
—2 = 5 mod 7. Thus, x = Ik + 5 for some k and, incidentally, x falls on a 
Friday. 

About 2500 years ago, the Maya of Central America and Mexico developed 
three calendars (each having a different use). Their religious calendar, called 
tzolkin, consisted of 20 “months,” each having 13 days (so that the tzolkin 
“year” had 260 days). The months were 


1 . Imix 

6. Cimi 

1 1 . Chuen 

16. Cib 

2.1k 

7. Manik 

12. Eb 

17. Caban 

3. Akbal 

8. Lamat 

13. Ben 

18. Etznab 

4. Kan 

9. Muluc 

14. Ix 

19. Cauac 

5. Chicchan 

10. Oc 

15. Men 

20. Ahau 


Let us describe a tzolkin date by an ordered pair 


[m, d ], 


where 1 < m < 20 and 1 < d <13; thus, m denotes the month and d 
denotes the day. Instead of enumerating as we do (so that Imix 1 is followed 
by Imix 2, then Imix 3, and so forth), the Maya let both month and day cycle 
simultaneously; that is, the days proceed as follows: 

Imix 1, Ik 2, Akbal 3,. . . , Ben 13, lx 1, Men 2,. . . , 

Cauac 6, Ahau 7, Imix 8, Ik 9, 

We now ask how many days have elapsed between Oc 11 and Etznab 5. 
More generally, let x be the number of days from tzolkin [m, d] to tzolkin 
[m r , d'\. As we remarked at the beginning of this example, the cyclic behavior 
of the days gives the congruence 

x = d r — d mod 13, 

while the cyclic behavior of the months gives the congruence 

x = m' — m mod 20. 

To answer the original question, Oc 1 1 corresponds to the ordered pair [10, 11] 
and Etznab 5 corresponds to [18, 5], Since 5—11 = —6 and 18 — 10 = 8, the 
simultaneous congruences are 


x = —6 mod 13 
x = 8 mod 20. 

In the previous example, we found the solutions: 


x = 228 mod 260. 
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It is not clear whether Oc 1 1 precedes Etznab 5 in a given year (one must 
look). If it does, then there are 228 days between them; otherwise, there are 
260 — 228 = 32 days between them (the truth is 228). ▲ 

If we do not assume that the moduli m and nr are relatively prime, then 
there may be no solutions to a linear system. For example, if m = m' > 1, 
then uniqueness of the remainder in the Division Algorithm shows that there 
is no solution to 

x = 0 mod m 
x = 1 mod m . 

Theorem 4.25. Let d = gcd The system 

x = b mod m 
x = b' mod m' 

Exercise 4.19 on page 148 
gives a condition guar- 
anteeing uniqueness of 
solutions. 


has a solution if and only ifb = b' mod d. 

Proof. If h = b mod m and /i = b' mod m' , then/w | (h-b)andm' | (, h—b' ). 
Since d is a common divisor of m and m' , we have d \ (h-b)andd \ {h — b'). 
Therefore, d \ (b — b'), because (h — b') — (h — b) = b — b' , and so b = 
b' mod d . 

Conversely, assume that b = b' mod d , so that there is an integer k with 
b' = b + kd. If m = dc and m' = dc' , then gcd {c,c') = 1, by Propo- 
sition 1.23. Hence, there are integers s and t with 1 = sc + tc' . Define 
h = b' sc + btc' . Now 


h = b' sc + btc' 

= (b + kd)sc + btc' 

= b(sc + tc') + kdsc 
= b + ksm 
= b mod m . 

A similar argument, replacing b by b' — kd , shows that h = b' mod m' . ■ 

Example 4.26. Solve the linear system 

x = 1 mod 6 
x = 4 mod 15. 

Here, b = 1 and b' = 4, while m = 6,m' = 15, and d = 3; hence, c = 2 
and c' = 5 (for 6=3-2 and 15 = 3-5). Now s = 3, and t = — 1 (for 
1 = 3 • 1 + (— 1) • 4). Theorem 4.25 applies, for 1 = 4 mod 3. Define 

h = 4- 3-2 + 1 • (-1)- 5 = 19. 

We check that 19=1 mod 6 and 19=4 mod 15. Since lcm(6, 15) = 30, the 
solutions are ... , —41, —11, 19, 49, 79, ▲ 

We are now going to generalize the Chinese Remainder Theorem for any 
number of linear congruences whose moduli are pairwise relatively prime. We 
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shall see in Chapter 6 that this new version, whose solutions are given more 
explicitly, can be used to reveal a connection with Lagrange Interpolation, a 
method for finding a polynomial that agrees with a finite set of data. 

Consider the following problem, adapted from Qin Jiushao, Nine Chapters 
on the Mathematical Art, 1247 CE. 

Three fanners equally divide the rice that they have grown. One goes to 
a market where an 83-pound weight is used, another to a market that 
uses a 112-pound weight, and the third to a market using a 1 35-pound 
weight. Each farmer sells as many full measures as possible, and when 
the three return home, the first has 32 pounds of rice left, the second 70 
pounds, and the third 30 pounds. Find the total amount of rice they took 
to market. 

We can model the situation in the problem with three congruences: 
x = 32 mod 83 

x = 70 mod 112 (4.1) 

x = 30 mod 135. 

Now, you could solve this system using the same method we used in Exam- 
ple 4.22: just write out each congruence in terms of its corresponding divisi- 
bility tests, and work from there. 

There’s another technique for solving Eqs. (4.1) that works in more general 
settings. The idea is to “localize” a solution*, where “localize” means consid- 
ering only one modulus at a time, ignoring the other two; that is, making the 
other two congruent to zero. Suppose we can find integers u,v,w such that 

u = 32 mod 83 v = 0 mod 83 w = 0 mod 83 

it = 0 mod 112 v = 70 mod 112 w = 0 mod 1 12 

u= 0 mod 135 v = 0 mod 135 w = 30 mod 135. 

Now take * to be u + v + w. Thanks to Proposition 4.5, we can find the 

remainder when u + v + w is divided by 83 by first finding the remainders 
when each of u, v, and w is divided by 83, and then adding the answers: 

x = m + u + u; = 32 + 0 + 0 mo d 83. 

Similarly, * = 70 mod 1 12 and * = 30 mod 135. 

So, how do we find such u, v, and u;? Let’s look at what we want u to do: 

u = 32 mod 83 
u = 0 mod 1 12 
m = 0 mod 135. 

It’s easy to make u congruent to 0 mod 112 and 0 mod 135: just let it be a 
multiple of 1 12 • 135 = 15120. So, we want u to look like 

u = k ■ 112- 135 = 15120k 

for some integer k. And we choose k to meet the local condition that u wants 
to be 32 modulo 83: 


15120k = 14k = 32 mod 83 


(4.2) 
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Now comes the important step: since 1 12 and 135 are relatively prime to 83, 
so is their product (Exercise 1.56 on page 35). Hence, 14 (which is the same as 
1 12-135 modulo 83) is also relatively prime to 83, and so Theorem 4.17 implies 
that we can solve Eq. (4.2) for k. There is an integer s with 14s = 1 mod 83, 
and multiplying both sides of 14 k = 32 mod 83 by s gives 

k = 32s mod 83. 

There are several methods for finding s (since 83 is not so small, the Euclidean 
Algorithm is probably the most efficient); in fact, s = 6, and so k satisfies 

6-32 = 192 = 26 mod 83. 


Hence, 


u = 26-112- 135 = 393120. 

To get a feel for this method, it’s a good idea to go through it twice more, 
finding v and w. In fact, that’s Exercise 4.22 on page 149. 

The method just developed generalizes to a proof of the extended Chinese 
Remainder Theorem. Let’s first introduce some notation. 

Notation. Given numbers mi, m 2 , .... m r , define 

Mj = m\ni2 ■■■mi ■■■ m r = my- mj-imj+i • • • m r ; 

that is, Mj is the product of all mj other than m / . 

Theorem 4.27 (Chinese Remainder Theorem Redux). If m\,m 2 , ■ ■ ■ ,m r 
are pairwise relatively prime integers, then the simultaneous congruences 

x = b\ mod mi 
x = i >2 mod m 2 

x = b r mod m r 

have an explicit solution, namely 

x = bi (s\Mi) + b 2 ( S 2 M 2 ) H 1 - b r ( s r M r ) , 

where 

Mj = mini 2 ■■■ mj ■■■ m r and s / M ] = 1 mod m,- for 1 < i < r. 

Furthermore, any solution to this system is congruent to x mod /W 1 /W 2 • • • m r . 

Proof. Use our discussion on the previous page as a model for the proof. 
That the specified x works is a consequence of Proposition 4.5. That all solu- 
tions are congruent mod/Mi m 2 . . . m r is a consequence of Exercise 1.58 on 
page 35. ■ 
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Exercises 

4.15 * Complete the proof of Theorem 4.27. 

4.16 (i) Solve 

* = 5 mod 7 
x = 2 mod 1 1 . 

(ii) In the year 2000, the remainder after dividing my age by 3 was 2, and the 
remainder after dividing by 8 was 3. If I was a child when people first walked 
on the Moon, how old was I in 2000? 

(iii) Solve 

x 1 = 5 mod 7 
x 1 1 =2 mod 1 1 . 

4.17 (i) Find a solution v to 

v = 3 mod 17 
v = 0 mod 1 1 . 


Answer, v = 88 mod 187. 

(ii) Find a solution w to 

w = 0 mod 17 
w = 9 mod 1 1 . 

Answer, w = 1 19 mod 187. 

(iii) Using your v and w from (i) and (ii), show that v + w is a solution to the 
system 

x = 3 mod 17 
x = 9 mod 1 1 . 


4.18 Solve 

.v = 32 (mod 83) 

* = 70 (mod 112) 
x = 30 (mod 135). 

4.19 * Theorem 4.25 says that if d = gcd then the system 

* = b mod m 
x = b' mod m' 

has a solution if and only if b = b ' mod d. Prove that any two solutions are 
congruent mod l, where l = lcm(/^7,/J7 , )• 

4.20 How many days are there between Akbal 13 and Muluc 8 in the Mayan tzolkin 
calendar? 

4.21 On a desert island, five men and a monkey gather coconuts all day, then sleep. The 
first man awakens and decides to take his share. He divides the coconuts into five 
equal shares, with one coconut left over. He gives the extra one to the monkey, 
hides his share, and goes to sleep. Later, the second man awakens and takes his 
fifth from the remaining pile; he, too, finds one extra and gives it to the monkey. 
Each of the remaining three men does likewise in turn. Find the minimum number 
of coconuts originally present. 

Hint. Try —4 coconuts. 




4.2 Public Key Codes 149 


4.22 * Finish the calculations solving Qin Jiushao’s problem on page 146 by first find- 
ing ^ and t , and then finding the smallest positive solution. 

4.23 A band of 17 pirates stole a sack of gold coins. When the coins were divided 
equally, there were three left over. So. one pirate was made to walk the plank. 
Again the sack was divided equally; this time there were 10 gold coins left over. 
So, another unlucky member of the crew took a walk. Now, the gold coins could 
be distributed evenly with none left over. Flow many gold coins were in the sack? 

4.24 (Bhaskaral. ca. 650 C.E.). If eggs in a basket are taken out 2, 3, 4, 5, and 6 at a 
time, there are 1, 2, 3, 4, and 5 eggs left over, respectively. If they are taken out 7 
at a time, there are no eggs left over. What is the least number of eggs that can be 
in the basket? 


4.2 Public Key Codes 

A thief who knows your name and credit card number can use this information 
to steal your money. So why isn’t it risky to buy something online, and pay 
for it by sending your credit card data? After all, thieves can read the message 
you are sending. Here’s why: the online company’s software encodes your 
information before it is transmitted; the company can decode it, but the thieves 
cannot. And the reason the thieves cannot decode your message is that codes 
are constructed in a clever way using number theory. 

It is no problem to convert a message in English into a number. Make a list 
of the 52 English letters (lower case and upper case) together with a space and 
the 1 1 punctuation marks 


In all, there are 64 symbols. Assign a two-digit number to each symbol. For 
example. 


a i — > 01, . . . , z i — > 26, A i — > 27, . . . , Z i — > 52 
space 53, . i — ^ 54, , i-> 55, . . . , (i->- 63, ) i-»- 64 

(we could add more symbols if we wished: say, $, +, — , =, — >, 0, 1, . . . , 9). A 
cipher is a code in which distinct letters in the original message are replaced by 
distinct symbols. It is not difficult to decode a cipher; indeed, many newspapers 
print daily cryptograms to entertain their readers. In the cipher we have just 
described, “I love you!” is encoded 

Ilove you! = 3553121522055325152158 

Notice that any message coded in this cipher has an even number of digits, and 
so decoding, converting the number into English, is a simple matter. Thus, 

(35)(53)(12)(15)(22)(05)(53)(25)(15)(21)(58) = I love you! 

What makes a good code? If a message is a natural number x (and this is no 
loss in generality, as we have just seen), we need a way to encode x (in a fairly 
routine way so as to avoid introducing any errors into the coded message), 
and we need a (fairly routine) method for the recipient to decode the message. 
Of utmost importance is security: an unauthorized reader of a coded message 
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Why these conditions 
on e? Read on. 


Note that x 4 = (x 2 ) 2 , 
x 8 = (x 4 ) 2 = ((x 2 ) 2 ) 2 , 
etc. 


should not be able to decode it. An ingenious way to find a code with these 
properties, now called an RSA code, was found in 1978 by Rivest, Shamir, 
and Adleman; they received the 2002 Turing Award for their discovery. 

The following terms describe two basic ingredients of RSA codes. 

Definition. A public key is an ordered pair ( N , e), where N = pq is a product 
of distinct primes p and q, and e is a positive integer with gcd(e, p — 1) = 1 
and gcd(e, q — 1) = 1. 

The numbers N and e are public — they are published on the web — but the 
primes p and q are kept secret. In practice, the primes p and q are very large. 

If x is a message, encoded by assigning natural numbers to its letters as 
discussed above, then the encoded message sent is 

x e mod N. 

Definition. Given a public key (N, e ), a private key is a number d such that 
x ed = x mod N for all x e Z. 

A private key essentially decodes the sent message, for 
x ed = ( x e ) d = x mod N. 

Only the intended recipients know the private key d . To find d , we’ll see 
that you need to factor N , and that’s very hard. Indeed, the modulus N being 
a product of two very big primes — each having hundreds of digits — is what 
makes factoring N so difficult. Since breaking the code requires knowing p 
and q, this is the reason RSA codes are secure. Now for the details. 

Ease of Encoding and Decoding 

Given a public key (N, e) and a private key d , we encode x as x e , and we 
send the congruence class x e mod N . A recipient who knows the number d 
can decode this, because 

(. x e ) d = x ed = x mod N. 

There is a minor problem here, for decoding isn’t yet complete: we know the 
congruence class of the original message x but not x itself; that is, we know 
x + kN for some k e Z but not x. There is a routine way used to get around 
this; one encodes long blocks of text, not just letters (see [18], pp. 88-91). 

Given any positive integer m, an efficient computation of x m mod N is 
based on the fact that computing x 2 mod N is an easy task for a computer. 
Since computing x 2 ‘ is just computing i squares, this, too, is an easy task. 
Now write the exponent m in base 2: 

m = 2' +2 j -\ h 2 Z . 

Computing 2 m is the same as multiplying several squares: 

..m _ 2‘+2J+-+2 z _ 2 1 2-t . ,.2 Z 

— A — A A A • 

In particular, after writing e in base 2, computers can easily encode a message 
x mod N as x e mod N and, after writing ed in base 2, they can easily decode 
x ed mod N . Since x ed = x mod N, this congruence essentially recaptures x. 
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Finding a Private Key 

Let ( N , e) be a public key, where N = pq. We want to find a private key; that 
is, a number d so that x ed = x mod N for all x e Z. More generally, let’s 
find conditions on any integer m so that x m = x mod pq. By Corollary 4.10, 
we have x m = x mod p if m = 1 mod (p — 1); similarly, x m = x mod q if 
m = 1 mod (q — 1). Now if m satisfies both congruence conditions, then 

p | (x m — x ) and q \ (x m — x). 

As p and q are distinct primes, they are relatively prime, and so pq \ (x m — x), 
by Exercise 2.20 on page 33. Hence, x m = x mod pq for all x; that is, 
x m = x mod N for all x € Z. 

Return now to the special case m = ed\ can we find a private key d so that 
ed = 1 mod (p— 1) and ed = 1 mod (q— 1)? By hypothesis, gcd(e, p— 1) = 
1 = gcd(e, q — 1); by Exercise 1.56 on page 35, gcd(e, (p — 1 )(q — 1)) = 1. 
We can now find d with Proposition 4.17, which shows how to construct an 
integer d such that 


ed = 1 mod (p — 1 )(q — 1). 

We have constructed an RSA code. 

Example 4.28. Let’s create a public key and a private key using p = 11 and 
<7=13. (This is just for the sake of illustration; in practice, both p and q need 
to be extremely large primes.) 

The modulus is N = pq = 11 • 13 = 143, and so p — 1 = 10 and 
q — 1 = 12. Let’s choose e = 7 (note that gcd(7, 10 ■ 12) = 1). Hence the 
public key is 

(N,e) = (143,7). 

If x is a message in cipher (i.e., a natural number), then the encoded message is 
the congruence class x 7 mod 143. To find the private key, we need a number d 
so that Id = 1 mod 120. Using Euclidean Algorithm II or a CAS, we find a 
private key 

d = 103, 

for 7- 103 = 721 = 6 - 120+ 1. 

Let’s encode and decode the word “dog”: d = 4; o = 15; g = 7. Thus, 
the cipher for dog is 041507. In the real world, the encoding is (41507) 7 , and 
the message sent out is the congruence class (41507) 7 mod 143. Decoding 
involves computing (4 1 507) 721 mod 143. As we said earlier, decoding is not 
finished by finding this congruence class; the numbers in this class are of the 
form (41507) 721 + 143k, and only one of these must be determined. As we 
said above, the method used in actual RSA transmissions encodes blocks of 
letters to get around this ambiguity. For this example, however, we’ll use a 
simpler method — we’ll send each letter separately, so that “dog” is sent as as 
three codes 


04 7 , 15 7 , 07 7 . 


This eliminates the ambiguity of recovering a congruence class rather than 
an integer, because the each letter will correspond to a (unique) integer less 
than 143. 
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The encoding is calculated like this: 

d: 4 7 = 82 mod 143 
o: 15 7 = 115 mod 143 
g: 7 7 = 6 mod 143. 

To decode these messages, apply the private key: 

82 103 = 4 mod 143 
115 103 = 15 mod 143 
6 103 = 7 mod 143. 

We get 4 ** d, 15 ** o, and 7o g: “dog,” which was the original message. ▲ 

How to Think About It. A CAS can easily tell you that 82 103 = 4 mod 
143, but it’s interesting to see how the theorems developed in this chapter can 
allow you to do the computation by hand. Start with the fact that the reduction 
of 82 103 mod 143 is equivalent to two calculations, since 143 is 11 • 13: 

82 103 mod 11 
82 103 mod 13. 

The computations of the remainders when 82 103 is divided by a prime are 
made easy via Fermat’s Little Theorem and the “reduce as you go” idea: 

82 103 = 5 103 mod 1 1 (because 82 = 5 mod 1 1) 

82 103 = 4 103 mod 13 (because 82 = 4 mod 13). 

Now work on the exponents: 

5 103 = 5 kmo+ 3 = ( 5 10^ 10 5 3 = 5 3 mod n (Little Fermat) 

= 125 mod 11 = 4 mod 11 

and 

4-103 = 4 t2-8+7 = ( 4 12 ) 10 4 7 = 4 ? mod 13 (Little Fermat) 

= 4 3 • 4(4 3 ) = 64 • 4(64) = (-1) • 4(-l) mod 13 = 4 mod 13. 


Constructing Secure RSA Codes 

Let’s construct a specific type of public key. Choose distinct primes p = 
2 mod 3 and q = 2 mod 3. Now p — 1 = 1 mod 3, so that gcd(3, p — 1) = 1, 
and — 1 = 1 mod 3, so that gcd(3, q — 1) = 1. Therefore, ( N , 3) is a public 
key, where N = pq. The reason that these RSA codes are so secure is that the 
factorization of a product N = pq of two very large primes is very difficult. 
Thieves may know the transmitted message x 3 mod N , and they may even 
know N , but without knowing the factorization of N = pq, they don’t know 
p — 1 and q — 1, hence, they don’t know d (for 3d = 1 mod {p — 1)(£/ — 1)), 
and they can’t decode. Indeed, if both p and q have about 200 digits (and, 
for technical reasons, they are not too close together), then the fastest existing 
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computers need two or three months to factor N . A theorem of Dirichlet ([5], 
p. 339) says that if gcd(a, b) = 1, then the arithmetic progression a + bn, 
where n > 0, contains infinitely many primes. In particular, there are infinitely 
many primes of the form 2 + 3 n; that is, there are infinitely many primes p 
with p = 2 mod 3. Hence, we may choose a different pair of primes p and q 
every month, say, thereby stymying the crooks. 

RSA codes have been refined and made even more secure over the years. 
Some of these refinements make use of elliptic curves, which we’ll touch on in 
Chapter 9. The book In Code [13] is a readable account of how a high school 
student contributed to other refinements. 


Exercises 


4.25 For this exercise, use the primes p = 5 and q = 17 to create public and private 
keys. 

(i) What will be the modulus N for the public key? 

(ii) The exponent e for the public key must have no common factors with p — 1 
and q — 1 . List the five smallest numbers relatively prime to(p — l)(r/ — 1). 

(iii) There are many possibilities for e; for now. use e = 3. To encode letters (a 
computer would do blocks of letters), use the rule x i-r x 3 mod 85. 

(iv) Encode the phrase “cell phones” using this method. 

(v) The private key d satisfies 

ed = 1 mod (p — l)(<y — 1 ). 

Find d, decode your message using the private key, and verify that it is. in- 
deed, what was sent. 

4.26 The following message was encoded using the public key (85, 3): 

01 42 59 10 49 27 56; 

decode this message. It answers the question, “What do you call a boomerang that 
doesn’t come back when you throw it?” 

4.27 Decode the following message encoded using the public key (91,5): 

04 31 38 38 23 71 14 31. 

4.28 Let m and r be nonnegative integers, and p be a prime. If m = r mod (p — 1), 
show that x m = x r for all integers x. 

4.29 Take It Further. (Electronic Signatures) Consider this scenario: Elvis receives 
an email, encoded with his public key, from his abstract algebra instructor Mr. Jag- 
ger, which says that algebra is a waste of time and Elvis should spend all his time 
watching TV. Elvis suspects that the message didn’t really come from Mr. J., but 
how can he be sure? 

Suppose both Elvis and Mr. Jagger have private keys, and each knows the 
other’s public keys. They can communicate in total privacy, with no one able to 
read their messages. Here’s how: if Elvis wants to send a message to Mr. J., he 
follows these steps: 

• Write the message to get x\ . 

• Encode the message with his private key to get X 2 - 

• Encode X 2 with Mr. J.’s public key to get .Y 3 . 

• Send X 3 . 


The public key reveals 
e = 3 and N = pq, but 
p and q are not revealed. 
(Why not?) 
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These public keys are not 
realistic. In reality, public 
keys use much larger 
primes. 


When Mr. J. receives the message, he can follow a procedure to get the original 
message back. 

(i) What is the procedure? 

(ii) Explain why no one besides Mr. Jagger could read the message front Elvis. 

4.30 Take It Further. Elvis is home sick with the flu. He decides to send a message 
to Mr. Jagger, using the method from Exercise 4.29. Suppose Elvis’s public key 
is (253, 7) and Mr. J.’s public key is (203, 5). Elvis sends the message 

FIDO ATE MY HOMEWORK. 

What is the encoded message that Mr. J. receives? Show how Elvis encodes it and 
how Mr. J. decodes it. 


4.3 Commutative Rings 

We begin this section by showing, for an integer m > 2, that we can add and 
multiply the remainders 0, 1 , . . . , m — 1 in such a way that the new operations 
behave very much like ordinary addition and multiplication in Z. Once this is 
done, we will be able to revisit congruences and understand what “makes them 
tick.” 

It is shown in Appendix A. 2 that if = is an equivalence relation on a set X, 
then the equivalence class of an element a € X is 

[a\ = {x € X : x = a). 

Now Proposition 4.3 says that congruence mod m is an equivalence relation 
on Z; the equivalence class of an integer a is called its congruence class 
mod m. 


The congruence class 
[a] does depend on m, 
but it is standard practice 
not to make m part of 
the notation. In fact, we’ll 
eventually write a instead 
of [a]. 


Definition. The congruence class mod m of an integer a is 

\a\ ={k € Z : k = a mod m} 

={. . . , a — 2m, a — m, a, a + m, a + 2m , . . .}. 

The integers mod m is the set of all congruences classes: 

= {[ 0 ], [ 1 ], 1 ]}. 

Corollary 4.4 says that the list [0], [1], [m — 1] is complete; that is, there 

are no other congruence classes mod m. 

For example, Z 2 , the integers mod 2, is the set {[0], [1]}; we may think of 
[0] as even (for [0] = {a e Z : a = 0 mod 2} is the set of all even integers) 
and [1] as odd (for [1] is the set of all odds). 

Here is the “theological reason” for introducing congruence classes. We 
could continue to deal with integers and congruence; this is, after all, what 
Gauss did. We saw in Proposition 4.5 that + and x are compatible with con- 
gruence: if a = b mod m and a' = b' mod m, then a + b = ci' + b' mod m 
and ab = a'b' mod m. But wouldn’t life be simpler if we could replace = by 
=; that is, if we could replace congruence by equality? We state the following 
special case of Lemma A. 16 in Appendix A.2 explicitly: 


a = b mod m if and only if [a] = [b\ in Z m . 
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We often say “odd + odd = even,” which does replace = by = at the cost of 
replacing integers by their congruence classes. Thus, we should define addition 
of these congruence classes so that [1] + [1] = [0], 

Addition and multiplication of evens and odds leads to the following tables. 


X 

even 

odd 

even 

even 

even 

odd 

even 

odd 


+ 

even 

odd 

even 

even 

odd 

odd 

odd 

even 


Rewrite these tables using congruence classes mod 2. 


+ 

[ 0 ] 

[ 1 ] 

[ 0 ] 

[ 0 ] 

[ 1 ] 

[ 1 ] 

[ 1 ] 

[ 0 ] 


X 

[0] 

[1] 

[0] 

[0] 

[0] 

[1] 

[0] 

[1] 


We saw above that [1] + [1] = [0] says that “odd + odd = even;” note that 
[1] x [1] = [1] says “odd x odd = odd.” The table above on the left de- 
fines addition of:Z 2 x Z 2 Z 2 ; the table on the right defines multiplication 
/z: Z 2 x Z 2 — > Z 2 . As usual, we view congruence as generalizing parity, and 
we now extend the definitions to give addition and multiplication of congru- 
ence classes mod m for all m > 2. 

Definition. If m > 2, addition and multiplication Z m x Z m — > Z m are defined 
by 


A binary operation on a set 
R is a function Rx R R 
(in particular, R is closed 
under /: if a and b are in 
R, then f(a,b) is in R). 
Can you prove associativity 
of the binary operations a 
and /z when R = Z2? 


[r] + [,s] = [r + . 5 ] and [r][.s] = [r.s]. 


The definitions are simple and natural. However, we are adding and multi- 
plying congruence classes, not remainders. After all, remainders are integers 
between 0 and m— 1 , but the sum and product of remainders can exceed m — 1 , 
and hence are not remainders. 


Lemma 4.29. Addition and multiplication Z m x Z m — > Z m are well-defined 
functions. 

Proof. To see that addition is well-defined, we must show that if [r] = [r 1 ] 
and [s] = [ 5 '], then [r + 5 ] = [r' + j']. But this is precisely what was 
proved in Proposition 4.5. A similar argument shows that multiplication is 
well-defined. ■ 

Binary operations f:RxR^rR. being functions, are single-valued. This 
is usually called the Law of Substitution in this context: If (r, s) = {r' ,s'), 
then f(r,s ) = f{r',s'). In particular, if /: Z m x Z m — > Z m is addition or 
multiplication, then [r] = [r'] and [.?] = [ 5 '] imply [r] + [ 5 ] = [ r '] + [s'] and 

[r][s] = [r'][s']. 

We are now going to show that these binary operations on Z m enjoy eight 
of the nine fundamental properties of ordinary arithmetic on page 37. We have 
already seen several number systems in which addition and multiplication sat- 
isfy these familiar properties, so let’s make these properties into a definition. 
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Negatives are often called 
additive inverses. 


Definition. A commutative ring is a nonempty set R having two binary oper- 
ations: addition R x R — > R, denoted by (r,s) i->- r + s, and multiplication 
R x R — »• R, denoted by (r, s ) i-»- rs, which satisfy the following axioms for 
all a,b,c € R: 

(i) a + b = b + a\ 

(ii) there is 0 & R with a + 0 = a for all a e R: 

(iii) for each a G R, there is —a € R, called its negative, such that 
— a + a = 0 ; 

(iv) (Associativity of Addition) a + (b + c) = (a + b) + c; 

(v) (Commutativity of Multiplication) ab = ba; 

(vi) there is I e R, called its identity, with 1 ■ a = a for all a e R; 

(vii) (Associativity of Multiplication) a {be) = ( ab)c ; 

(viii) (Distributivity) a(b + c) = ab + ac. 


How to Think About It. There are more general (non-commutative) rings 
in which (v), commutativity of multiplication, is not assumed, while (vi) is 
modified to say that 1 • a = a = a • 1 and (viii) is modified to say a(b + c) = 
ab + ac and ( b + c)a = ba + be. A good example is the ring of all 2 x 2 
matrices with entries in E, with identity element [ J ° ], and binary operations 
ordinary matrix addition and multiplication: 


and 


a 

c 



b' 

d' 


a + a' b + b' 
c + c' d + cl' 


1 

a 

1 

1 

b r 


+ 

1 

ab' 

+ bd r 

1 

1 

1 

d' 


+ 

cb' 

+ dd r 


Since all rings in this book are commutative, we will often abuse language and 
abbreviate “commutative ring” to “ring.” 


The ninth fundamental property of real numbers is: If a ^ 0, there is a real 
number a ~ 1 , called its (multiplicative) inverse, such that a ■ a~ [ = 1 . We will 
soon consider commutative rings, called fields, which enjoy this property as 
well. 


How to Think About It. The notion of commutative ring wasn’t conceived 
in a vacuum. Mathematicians noticed that several useful systems shared the 
basic algebraic properties listed in the definition. Definitions usually emerge 
in this way, distilling common features of different interesting examples. 

Precise definitions are valuable; we couldn’t prove anything without them. 
For example, political discourse is often vapid because terms are not defined: 
what is a liberal; what is a conservative? A mathematician who asserts that 
there are infinitely many primes can be believed. But can you believe a politi- 
cian who says his opponent is a fool because he’s a liberal (or she’s a conser- 
vative)? 
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Example 4.30. (i) Z, Q, and IR are commutative rings. The ninth funda- 

mental property, reciprocals, does not hold in Z; for example, 2~ x = ^ 
does not lie in Z. 

(ii) Propositions 3.8 and 3.9 show that C is a commutative ring, while Propo- 
sition 3.11 shows that every nonzero complex number has an inverse. 

(iii) The set of even integers does not form a commutative ring, for it has no 
identity. 

(iv) The Gaussian integers Z[i] form a commutative ring (see Exercise 4.64 
on page 168). 

(v) The Eisenstein integers Z[a>], where co is a primitive cube root of unity, 
form a commutative ring (see Exercise 4.64). 

(vi) More generally, the cyclotomic integers Z[£], where £ is any primitive 
root of unity, form a commutative ring (see Exercise 4.65 on page 168). 

(vii) The next theorem shows that Z m is a commutative ring for every integer 
in > 2. 

(viii) We’ll see, in the next chapter, that all polynomials whose coefficients lie 
in a commutative ring (e.g., all polynomials with coefficients in Z) is 
itself a commutative ring with the usual addition and multiplication. A 


Example 4.31. (i) If R is a commutative ring and S is a set, let R s be the 

set of all functions S — > R. Define u: S —> R to be the constant function 
with value 1, where 1 is the identity element of R: that is, u(s) = 1 for 
all s € S. Define the sum and product of f. g e R s , for all ,v e .S', by 

f + g:s f(s ) + g(s) 

and 

fg:s i-» f(s)g(s); 

these operations are called pointwise addition and pointwise multipli- 
cation. We leave the straightforward checking that R s is a commuta- 
tive ring as Exercise 4.34. An important special case of this example is 
Fu n( R) = R r , the ring of all functions from a commutative ring R to 
itself. 

(ii) If X = [ a , b] is an interval on the line, then 

C(X) = {f: X — »• M : / is continuous} 

is a commutative ring under pointwise operations. If both /, g e C(X) 
are continuous, then it is shown in calculus that both f + g and fg are 
also continuous. The constant function e with e(t) = 1 for all t € X is 
continuous; we let the reader prove that the other axioms in the definition 
of commutative ring hold. ▲ 


If R = M, then Fun(R) = 
R k arises in calculus. 
After all, what are the 
functions x + cosx and 

X cosx? 


Etymology. The word ring was probably coined by Hilbert in 1897 when 
he wrote Zahlring. One of the meanings of the word ring, in German as in 
English, is “collection,” as in the phrase “a ring of thieves.” It has also been 
suggested that Hilbert used this term because, for a commutative ring such as 
the Gaussian integers Z [/ ] , powers of some elements “cycle back” to being a 
linear combination of smaller powers (for example, i, i 2 , / 3 , i A = 1, i 5 = i). 
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Theorem 4.32. Z m is a commutative ring for every integer m > 2. 


Proof. The proof of each of the eight statements is routine; in essence, they 
are inherited from the analogous statement in Z (the inheritance is made pos- 
sible by Proposition 4.5). We prove only statements (i), (vii), and (viii) in the 
definition of commutative ring; the other proofs are left to Exercise 4.31 below. 


Convince yourself that 
each step in these proofs 
is legitimate by supplying a 
reason. 


(i) [a] + [ft] = [a + b] = [b + a] = [b] + [a], 

(vii) (Associativity of Multiplication): 

[a]([b][c]) = [a][bc] = [c a(bc ) )] = [{ab)c] = [ab][c] = ([a][fc])[c]. 

(viii) (Distributivity): 

[a\([b] + [c]) = [a\[b + c] = [a(b + c)] 

= [ ab + ac ] = [ ab ] + [ ac ] 

= [a][b] + [a][c]. m 


A commutative ring is an algebraic system we view as a generalization of 
ordinary arithmetic. One remarkable feature of the integers mod m is that an 
integer a is divisible by m if and only if [a] = [0] in Z m (for m \ a if and only 
if a = 0 mod m); that is, we have converted a statement about divisibility into 
an equation. 


Exercises 

4.31 * Prove the remaining parts of Theorem 4.32 

4.32 Prove that every commutative ring R has a unique identity 1 . 

4.33 (i) Prove that subtraction in Z is not an associative operation. 

(ii) Give an example of a commutative ring in which subtraction is associative. 

4.34 * If R is a commutative ring and S is a set, verify that R s is a commutative ring 
under pointwise operations. (See Example 4.31.) 

4.35 * Define the weird integers W as the integers with the usual addition, but with 
multiplication * defined by 

\ ab if a or b is odd 
a * b = < 

I —ab if both a and b are even. 

Prove that W is a commutative ring. 

Hint. It is clear that 1 is the identity and that * is commutative; only associativity 
of * and distributivity must be checked. 

4.36 For each integer a between 1 and 11, find all solutions to [a]x = [9] in Zj 2 - 
(There may be no solutions for some a.) 

4.37 In Zg, find all values of x so that (x — l)(x + 1) = 0. 

4.38 Solve the equation x 2 + 3x — 3 = 0 in Z 5 . 

4.39 How many roots does the polynomial x 2 +1 = 0 have in each of the following 
commutative rings? 

(i) Z 5 (ii) Z 7 

(iv) Z 101 (v) Z 13 


(hi) Zn 
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Properties of Commutative Rings 

One advantage of precise definitions is that they are economical: proving a the- 
orem for general commutative rings automatically proves it for each particular 
commutative ring. For example, we need not prove that (— 1)(— 1) = 1 holds 
in the Gaussian integers Z[i] because we prove below that it holds in all com- 
mutative rings. The nice thing here is that some general proofs can be copied 
verbatim from those in Chapter 1. Alas, this is not always so. For example, the 
generalization of the Chinese Remainder Theorem does hold in Z [/], but its 
proof requires more than merely copying, mutatis mutandis, its proof in Z. 

Proposition 4.33. For every a in a commutative ring R, we have a x 0 = 0. 


Proof. Identical to the proof of Proposition 1.31. ■ 

Can 1 = 0 in a commutative ring R ! The answer is “yes,” but not really. If 
1 = 0 in R, then a = \a = 0 a = 0 for all a e R, by Proposition 4.33; that is, 
R consists of only one element, namely, 0. So, 1 f 0 in any commutative ring 
having more than one element. Commutative rings with only one element are 
called zero rings; they are not very interesting, although they do arise every 
once in a while. For example, Theorem 4.32 says that Z m is a commutative 
ring for every integer m > 2. Actually, Z m is a commutative ring for m > 0: 
we have Zo = Z, and Zi the zero ring. Since zero rings arise rarely, we declare 
that 1 f 0 for all commutative rings in this book unless we say otherwise. 


Proposition 4.34. For any a in a commutative ring R, we have 


hi particular, 


(— a)(— 1 ) = a. 
(-!)(-!) = 1 . 


Proof. Identical to the proof of Proposition 1.32. ■ 

Can an element a in a commutative ring R have more than one negative? 

Proposition 4.35. Let R be a commutative ring. Negatives in R are unique ; 
that is, for each a e R, there is exactly one a' e R with a + a' = 0. 

Multiplicative inverses, when they exist, are unique; that is, for each b e R, 
there is at most one b' e R with bb' = 1. 


Proof. Identical to the proof of Proposition 1 .33. As usual, the negative of a is 
denoted by —a, and the inverse of b, when it exists, is denoted by b~ x . ■ 

Corollary 4.36. For every a in a commutative ring R, we have —a = (— 1 )a. 
Moreover, if an element b has an inverse, then ( b ~ 1 ) _1 = b. 

Proof. Identical to the proof of Corollary 1 .34. ■ 

The distributive law for subtraction holds, where b — c is defined as 
b+ (— l)c. 


Corollary 4.37. If a, b, c lie in a commutative ring R, then a(b—c) = ab—ac. 
Proof. Identical to the proof of Corollary 1.35. ■ 
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Definition. Let R be a commutative ring. If a G R, define its powers by 
induction on n > 0. Set a 0 = 1 and, if n > 0, then a n+1 = aa n . 

We have defined a 0 = 1 for all a G R\ in particular, 0° = 1. 

The notation a n is a hybrid: a is an element of R while n is an integer. Here 
is the additive version of this notation. 

Definition. If R is a commutative ring and k > 0 is an integer, define ka = 
a + ■ ■ ■ + a, the sum of a with itself k times. If k = 0, define ka = 0 a = 0, 
where the 0 on the right is the zero element of R. If k < 0, then —k = \k\ > 0, 
and we define ka = (—k)(—a)\ that is, ka is the sum of —a with itself |/r| 
times. 

The hybrid ka can be viewed as the product of two elements in the com- 
mutative ring R. If e = 1 (the identity element in R), then ke G R and 
ka = ( ke)a . For example, if k > 0, then 


ka = a + a + 1- a = (e + e -\ + e)a = ( ke)a . 


We note that we could have defined ka, for k > 0, by induction. Set 0 a = 0 
and, if k > 0, then ( k + 1 )a = a + ka. 

The Binomial Theorem holds in every commutative ring R. Since we have 
defined ka whenever k is an integer and a G R, the notation (")a makes sense. 


Theorem 4.38 (Binomial Theorem). Let R be a commutative ring. 
(i) For all x € R and all integers n > 0, 


<i + *>" = E ■ 


7=0 V , 


= V X j 


(ii) For all a,b e R and all integers n > 0, 


(a + b) n = J2\ n V n - J b j 


7=0 V/ 




Proof. Identical to the proof of Theorem 2.25. ■ 


Units and Fields 

Let’s return to the ninth fundamental property of ordinary arithmetic. A nonzero 
element in a commutative ring may not have an inverse. For example, [2] ^ [0] 
in Z4, but there is no [a] e Z4 with [2 ][a] = [1]: the products [2] [a] are 

[2][0] = [0], [2] [ 1 ] = [2], [2] [2] = [4] = [0], [2][3] = [6] = [2]; 

none of these is [1], 

If m > 2, which nonzero elements in Z m have multiplicative inverses? 

Proposition 4.39. Let m > 2. An element [a] G Z m has an inverse if and only 
ifgcd(a,m) = 1. 
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Proof. Since gcd(a, m) = 1, Theorem 4.17 says that there is an integer .v so 
that sci = 1 mod m . Translating this congruence to 1 m (using the definition of 
multiplication in Z m ), we have 

[s][a] = [so] = [1]; 


thus, [s] is the inverse of [a] . 

Conversely, if [s][c/] = [1] in Z m , then [.sa] = [1] and sa = 1 mod m. 
Therefore, m \ (sa — 1), so that sa — 1 = tin for some integer t, and 
gcd (a,m) = 1, by Proposition 1.23. ■ 

If a and m are relatively prime, then the coefficients s and t displaying 1 as 
a linear combination are not unique (see Exercise 1.57 on page 35). However, 
Proposition 4.35 shows that the congruence class of s mod m is unique: if 
also 1 = s' a + t'm, then [s'] = [j], for both equal [a] -1 in Z m ; that is, 
s' = s mod in, for inverses are unique when they exist. 

Dividing by an element a e R means multiplying by a -1 . Thus, dividing 
by zero requires an element 0 -1 e R with 0 1 x 0 = 1. But we saw, in 
Proposition 4.33, that a x 0 = 0 for all a e R: in particular, O -1 x 0 = 0. It 
follows that if 1 f 0 in R; that is, if R has more than one element, then 0 -1 
does not exist; therefore, we cannot divide by 0. 


How to Think About It. There is a strong analogy between the method for 
solving linear equations in elementary algebra and the proof of Theorem 4.17. 
When solving an equation like 3x = 4 in first-year algebra, you multiply both 
sides by the number u with u 3=1, namely, u = ^: 


3x = 4 
j(3x) = \4 

* = f 

Now look at a congruence like 3x = 4 mod 7 as an equation in Z7, 

[3]x = [4], 

and go through the same steps as above, using the fact that [5] • [3] = [1]: 

[3]x = [4] 

[5]([3]x) = [5] • [4] 

([5] • [3])x = [6] 

x = [6], 

As we remarked on page 158, the notion of commutative ring allows us to turn 
congruences into equations that obey the usual rules of elementary algebra. 


Notation. The various Z m are important examples of commutative rings. It 
is getting cumbersome, as in the above calculation, to decorate elements of 
Z m with brackets. From now on, we will usually drop the brackets, letting 
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the context make things clear. For example, the calculation in Z 7 above will 
usually be written 

3x = 4 
5(3%) = 5-4 
(5 • 3) % = 6 
x = 6. 

Definition. An element u in a commutative ring R is a unit if it has a multi- 
plicative inverse in R; that is, there is v e R with uv = 1. 

Note that v must be in R in order that u be a unit in R. For example, 2 is 
not a unit in Z because j is not in Z; of course, 2 is a unit in Q. 

Knowledge of the units in a commutative ring R tells us a great deal about 
how much elementary algebra carries over to R. For example, knowing whether 
or not a is a unit in R tells us whether or not we can solve the equation ax = b 
in R by dividing both sides by a . 

Example 4.40. (i) The only units in Z are ± 1 . 

(ii) Proposition 4.39 describes all the units in Z m . It says that [a] is a unit in 
Z m if and only if gcd(o, m) = 1. 

(iii) Every nonzero element of Q, R, and C is a unit. ▲ 

What are the units in Z [i ] ? Our work in Chapter 3 lets us find the answer. 
Every nonzero Gaussian integer z has an inverse in C, but that inverse may not 
be in Z [/]. Proposition 3.11 shows, in C, that 



z z 


The denominator on the right-hand side is none other than (V(z), the norm of z, 
and this suggests the following proposition. 

Proposition 4.41. A Gaussian integer z isa unitin Z[i] if and only if N (z) = 1. 

Proof. If (V(z) = 1, the formula z -1 =z/(zz) = T/N(z) shows that z ^ 1 = 
z, a Gaussian integer, and so z is a unit in Z[i], 

Conversely, if z is a unit in Z[(], then there is a Gaussian integer w with 
zw = 1. Take the norm of both sides; Proposition 3.35(iii) gives 

N{z)N(w) = 1. 

This is an equation in Z saying that a product of two integers is 1 . The only way 
this can happen is for each factor to be ± 1 . But norms are always nonnegative, 
by Proposition 3.35, and so (V(z) (and also N(w )) is equal to 1. ■ 

Proposition 4.41 leads to the question “Which Gaussian integers have norm 
1 ?” If z = a + bi isa Gaussian integer and /V(z) = 1 , then a and b are integers 
with a 2 + h 2 = 1. Using the fact that (a, (?) is a lattice point on the unit circle, 
its distance to the origin is 1, and we see that the only (a, (?) satisfying the 
equation are 

( 1 , 0 ), ( 0 , 1 ), (- 1 , 0 ), ( 0 ,- 1 ). 


Hence, we have 
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Proposition 4.42. There are exactly four units in Z [/], namely 

1, i , —1, —i. 

We know that 0 is never a unit in a nonzero commutative ring R: what if 
every nonzero element in R is a unit? 

Definition. A field is a nonzero commutative ring F in which every nonzero 
a e F is a unit; that is, there is b € F with ab = 1. 

Familiar examples of fields are Q, R, and C; here is a new example. 

Theorem 4.43. Ifm > 2, then Z m is a field if and only if m is a prime. 

Proof. If m is prime and 0 < a < m , then gcd(a, m) = 1, and Proposition4.39 
says that a is a unit in Z m . Hence, Z m is a field. 

Conversely, suppose that m is not prime; that is, m = ab, where 0 < a, 
b < m. In Z m , both a and b are nonzero, and ab = 0. If a has an inverse in 
Z m , say, s, then sa = 1, which gives the contradiction: 


0 = .v0 = s(ab) = ( sa)b = 1 b = b. ■ 


Who would have thought that a field could have a finite number of elements? 
When one of us was a graduate student, a fellow student was tutoring a 10-year 
old prodigy. To illustrate the boy’s talent, he described teaching him how to 
multiply 2x2 matrices. As soon as he was shown that the 2x2 identity matrix 
/ satisfies I A = A for all matrices A, the boy immediately began writing; 
after a few minutes he smiled, for he had just discovered that A = [“ b d ] has 
an inverse if and only if ad — be f 0! Later, when this boy was told the 
definition of a field, he smiled as the usual examples of Q, R, and C were 
trotted out. But when he was shown Z 2 , he threw a temper tantrum and ended 
the lesson. 

In Theorem 4.17, we considered linear congruences in one variable. We 
now consider linear systems in two variables. 

Theorem 4.44. If p is a prime, then the system 

ax + by = u mod p 
cx + dy = v mod p 

has a solution {x, y) if and only if the determinant ad — be ^ 0 mod p. 

Proof. Since p is a prime, we know that T, p is a field. Now the system of 
congruences can be considered as a system of equations in h p . 

ax + by = u 
cx + dy = v. 

You can now complete the proof just as in linear algebra. ■ 


We have removed the 
brackets from the notation 
for elements of Z,„ . 
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Example 4.45. Find the solution in Z7 of the system 

4x — 5y = —2 
2x + 3 y = 5. 

We proceed as in linear algebra. The determinant is 4-3 — ( — 5)-2 = 22 7^ 0 
in Z7, and so there is a solution. Now 4 _1 = 2 in Z7 (for 4-2 = 8 = 1 mod 7), 
so the top congruence can be rewritten as x — lOy = —4. Since — 10 = —3, 
we have 

x =3y- 4. 

Substituting into the bottom equation gives 2(3y — 4) + 3y = 5; that is, 9y = 
13; rewrite this as 2 y = 6. Multiply by 4 = 2 -1 to obtain y = 24 = 3. 
Finally, x = 3y — 4=9 — 4=5. Therefore, the solutionis (5, 3). Let’s check 
this. If x = 5 and y = 3, then 

4-5 — 5-3 = 4 = — 2 mod 7 
2 • 5 + 3 • 3 = 19= 5 mod 7. ▲ 


How to Think About It. Had you mimicked the method in the example 
when proving Theorem 4.44, you would have found Cramer’s Rule , a generic 
formula for the solution to the system 


ax + by = u 
cx + cly = v. 


The solutionis (x, y), where 


det 


x = 


det 


u b 
v d 


a b 
c d 


and 


det 


a 

c 



u 

v 


b 

d 


Thus, Cramer’s Rule holds, giving us an easily remembered formula for solv- 
ing 2x2 systems of equations in any field. Most linear algebra courses present 
a more general Cramer’s Rule for n x n systems. 


Exercises 

4.40 Give an example of a commutative ring R containing an element a with a 0, 
a f \, and a 1 = a. 

4.41 * The notation in this exercise is that of Example 4.31. 

(i) Find all the units in Fun(R) = R®. 

(ii) Prove that a continuous function u\X — > R is a unit in C(X) if and only if 
u(t) f=. 0 for all t e X. 

4.42 Let R = Z[\/3] = {a + b \/3 : a, b e Z}. 

(i) Show, with the usual addition and multiplication of real numbers, that R is a 
commutative ring. 

(ii) Show that u = 2 + \/3 is a unit in R. 

(iii) Show that R has infinitely many units. 
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4.43 

4.44 


4.45 

4.46 

4.47 

4.48 

4.49 


4.50 

4.51 


4.52 


4.53 

4.54 


4.55 


If p is a prime, show that a quadratic polynomial with coefficients in 7L p has at 
most two roots in Zp . 

* Prove or give a counterexample. Let R be a commutative ring. 

(i) The product of two units in R is a unit. 

(ii) The sum of two units in R is a unit. 

* Describe all the units in the Eisenstein integers Z[a>], 

* Just as in C, a root of unity in a ring R is an element a e R with a n = 1 for 
some positive integer n . 

Find all roots of unity in Z m for all integers m between 5 and 12. 

* Show that Z m contains exactly units, where 0 is the Euler 0-function. 

* Show that an element u e Z m is a unit if and only if u is a root of unity. 

* If u is a unit in Z m . then Exercise 4.48 says there is some positive integer n 
with u n = 1 ; the smallest such n is called the order of u in Z m . 

For each integer m between 5 and 12, make a table that shows the units and 
their orders. Any conjectures about which integers can be orders of units? 

State and prove Cramer’s Rule for a 3 x 3 system of linear equations in a held. 

Solve the system of congruences 

3x — 2y + z = 1 mod 7 
x + y — 2z = 0 mod 7 
—.v + 2y + z = 4 mod 7. 

For what values of m will the system 

2x + 5y = 1 
x + 4y = 9 


have a unique solution in Z m ? 


Find a system of two linear equations in two unknowns that has a unique solution 
in Z m for all m > 2. 


(i) Show that 


M 2 = 


a b 
—b a 


a, b 6 


is a commutative ring under matrix addition and multiplication, 
(ii) What are the units in M2? 

* 

(i) Show that 


F4 = 


b 

a + b 


: a, b e 


with binary operations matrix addition and multiplication, is a held having 
exactly four elements. 

(ii) Write out addition and multiplication tables for F4. 
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More precisely, if a is the 
addition on R, then its 
restriction a|(5 x S) has 
image in S, and it is the 
addition on S. Similarly for 
multiplication. 

Query: Is Z,„ a subring of 
Z? 


1 ■ B is exclusive or ; that 
is, all x e X lying in either 
A or B but not in both. In 
terms of Venn diagrams, 
this pictures the statement: 
Take it or leave it! 

Recall that S' 4 is the family 
of all functions from a set 
A to a set B . Why is this 
ring denoted by 2 X ? We’ll 
see why in Example 5.16. 


Subrings and Subfields 

Sometimes, as with Z and Q, one ring sits inside another ring. 

Definition. A subring of a commutative ring R is a commutative ring S con- 
tained in R that has the same 1, the same addition, and the same multiplication 
as R ; that is, 1 G S and if a,b G S, then a + b G S and ab G S. 

Each commutative ring on the list ZcQcRcCisa subring of the next 
one. Example A. 20 in Appendix A. 3 says that if R is a commutative ring and 
k c R is a subring that is a field, then R is a vector space over k. Thus, C is a 
vector space over R (and also over Q), and R is a vector space over Q. 

Proposition 4.46. A subset S of a commutative ring R is a subring of R if and 
only if 

(i) 1 e 5; 

(ii) if a, b g 5, then a + b g 5; 

(iii) if a, b G S, then ab G S. 

Proof If S is a subring of R , then the three properties clearly hold. 

Conversely, if S satisfies the three properties, then S contains 1, and so it 
only remains to show that S is a commutative ring. Items (ii) and (iii) (closure 
under addition and multiplication) show that the (restrictions of) addition and 
multiplication are binary operations on S . All the other items in the definition 
of commutative ring are inherited from R. For example, the distributive law 
holds: since a (b + c) = ab + ac holds for all a, b, c G R, it holds, in particular, 
for all a,b,c G S c R. ■ 

Proposition 4.46 is more powerful than it looks. A subset S of a commuta- 
tive ring R is a subring if, using the same operations as those in R, it satisfies 
all the conditions in the definition of commutative ring. But there’s no need 
to check all the properties; you need check only three of them. For example. 
Exercise 4.64 on page 168 asks you to prove that the Gaussian integers Z [z] 
and the Eisenstein integers Z [co\ are commutative rings. This could be tedious: 
there are ten things in the definition of commutative ring to check: addition 
and multiplication are binary operations and the eight axioms. However, if we 
know that C is a commutative ring and Z[i] and Z [co\ are subrings of C (facts 
that can be established via Proposition 4.46), then both Z [/ ] and Z [<w] are com- 
mutative rings in their own right. 

Example 4.47. Here is an example of a commutative ring arising from set 
theory. If A and B are subsets of a set X , then their symmetric difference is 

A + B = (A U B) — (A n B) 

(see Figure 4.2). If U and V are subsets of a set X, then 

U - V = {x G X : X G U and jc £ V}. 

Let A be a set, let 2 X denote the set of all the subsets of X, define addition 
on 2 X to be symmetric difference, and define multiplication on 2 X to be inter- 
section. Exercises 4.68 through 4.74 on page 169 essentially show that 2 X is a 
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commutative ring. The empty set 0 is the zero element, for ,4 + 0 = A, while 
each subset A is its own negative, for A + A = 0. These exercises also show 
that symmetric difference is associative and that the distributive law holds. Fi- 
nally, X itself is the identity element, for X IT A = A for every subset A. We 
call 2 X a Boolean ring. 

Suppose now that Y C X is a proper subset of X; is 2 Y a subring of 2 X ? 
If A and B are subsets of Y. then A + B and 4 fl B are also subsets of Y ; 
that is, 2 y is closed under the addition and multiplication on 2 X . However, the 
identity element in 2 Y is Y, not X. and so 2 y is not a subring of 2 X . ▲ 

The example of 2 X may have surprised you. It was natural for us to in- 
troduce the notion of commutative ring, for we had already seen many ex- 
amples of numbers or of functions in which addition and multiplication make 
sense and obey the usual rules. But the elements of 2 X are neither numbers nor 
functions. And even though we call their binary operations addition and mul- 
tiplication, they are operations from set theory. This is a happy circumstance, 
which we will exploit in the next chapter. It’s not really important what we 
call addition and multiplication; what is important is that the operations satisfy 
eight fundamental properties; that is, the axioms in the definition of commuta- 
tive ring. 

Just as the notion of a subring of a commutative ring is useful, so too is the 
notion of a subfield of a field. 

Definition. If F is a field, then a subfield of F is a subring k C F that is also 
a field. 

For example, Q is a subfield of R, and both Q and R are subfields of C. 

There is a shortcut for showing that a subset is a subfield. 

Proposition 4.48. A subring k of a field F is a subfield of F if and only if 
a -1 € k for all nonzero a € k. 

Proof. This is Exercise 4.57 below. ■ 


Exercises 

4.56 Give an example of a subring of a field that is not a field. 

4.57 * Prove Proposition 4.48. 

4.58 (i) Show that {0, 2} C Z 4 has the same addition and multiplication tables as Z 2 . 

(ii) Is Z 2 a subring of Z 4 ? 

(iii) Is {0, 2, 4, 6 } a subring of Zs? 
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4.59 Let R = Z[\/— 3] = {a + b >/— 3 : a,b e Z}. 

(i) Show that R is a subring of the Eisenstein integers. 

(ii) What are the units in R7 

4.60 (i) If S and T are subrings of a ring R, show that S fl T is also a subring of R. 
(ii) Show that the intersection of the Gaussian and Eisenstein integers is Z. 

4.61 * 

(i) If (Sj)j£i is a family of subrings of a commutative ring R , prove that their 
intersection H/e/ $i * s also a subring of R. 

(ii) If X is a subset of a commutative ring R, define G(2(), the subring generated 
by X, to be the intersection of all the subrings of R that contain X. 

Prove that G(X ) is the smallest subring containing X in the following 
sense: if S is any subring of R containing X, then G(X) C S. 

(iii) Let (Sj)j € / be a family of subrings of a commutative ring R, each of which 
is a field. Prove that the subring H/s/ $i is a field. Conclude that the inter- 
section of a family of subfields of a field is a subfield. 

4.62 Let p be a prime and let A p be the set of all fractions with denominator a power 
of p. 

(i) Show, with the usual operations of addition and multiplication, that A p is a 
subring of Q. 

(ii) Describe the smallest subring of Q that contains both A 2 and A 5 . 

4.63 Let p be a prime and let Q p be the set of rational numbers whose denominator 
(when written in lowest terms) is not divisible by p. 

(i) Show, with the usual operations of addition and multiplication, that Qp is a 
subring of Q. 

(ii) Show that Q 2 H Q 5 is a subring of Q. 

(iii) Is a field? Explain. 

(iv) What is Qp fl A p , where A p is defined in Exercise 4.62? 

4.64 * 

(i) Prove that Z\i] = {a + hi : ; 2 = —1 and a,b s Z}, the Gaussian integers, 
is a commutative ring. 

(ii) Prove that Z[a>\ = {a + ba> : a > 3 = 1 and a, b e Z}, the Eisenstein integers, 
is a commutative ring. 

4.65 * Prove that Z[f] = {a + b^‘ : 0 < i < n and a,b 6 Z} is a commutative ring, 
where f is a primitive nth root of unity. 

4.66 * It may seem more natural to define addition in 2 X as union rather than symmet- 
ric difference. Is 2 X a commutative ring if addition A © B is defined as A U B 
and A B is defined as A fl B1 

4.67 If X is a finite set with exactly n elements, how many elements are in 2 X 2 

4.68 * If A and B are subsets of a set X, prove that A C B if and only if A = A fl B. 

4.69 * Recall that if A is a subset of a set X . then its complement is 

A c = {x e X : x £ A}. 

Prove, in the commutative ring 2 X , that A c = X + A. 


4.70 * Let A be a subset of a set X. If S C X, prove that A c = S if and only if 
A U S = X and A n S = 0. 
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4.71 Let A, B , C be subsets of a set X. 

(i) Prove that A U (B fl C) = (A U B) D (^4 U C). 

(ii) Prove that A fl (B U C) = (^4 fl B) U (.4 fl C). 

4.72 If A and B are subsets of a set X, then A — B = {x e A : x B }. Prove that 

A — B = A D B c . In particular. X — B = B c , the complement of B . 

4.73 * Let A and B be subsets of a set X . Prove the De Morgan laws : 

(A U B) c = A c n B c and (. A n B) c = A c U B c , 
where A c denotes the complement of A. 

4.74 * If A and B are subsets of a set X , define their symmetric difference by A + B = 
(^4 — B) U (B — A) (see Figure 4.2). 

(i) Prove that A + B = (A U B) — (A fl B). 

(ii) Prove that (A + B) U (A fl B) = A U B. 

(iii) Prove that A + A = 0. 

(iv) Prove that A + 0 = A. 

(v) Prove that A + (B + C) = (A + B) + C . 

Hint. Show that each of A + (B + C) and (A + B) + C is described by 
Figure 4.3. 

(vi) Prove that the Boolean ring 2 X is not a field if X has at least two elements. 



4.75 Prove that A fl (B + C) = (A fl B) + (A fl C). 

4.4 Connections: Julius and Gregory 

On what day of the week was July 4, 1776? We’ll use congruence to answer 
this question. In fact, we’ll answer in two ways: with an exact formula com- 
puting the day, and with a faster refinement, due to Conway. 

Let’s begin by seeing why our calendar is complicated. A year is the amount 
of time it takes the Earth to make one complete orbit around the Sun; a day is 
the amount of time it takes the Earth to make a complete rotation about the axis 
through its north and south poles. There is no reason why the number of days 
in a year should be an integer, and it isn’t; a year is approximately 365.2422 
days long. In 46 BCE, Julius Caesar (and his scientific advisors) changed the 
old Roman calendar, creating the Julian calendar containing a leap year every 
four years; that is, every fourth year has an extra day, namely, February 29, and 
so it contains 366 days (a common year is a year that is not a leap year). This 
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would be tine if the year were exactly 365.25 days long, but it has the effect of 
making the year 365.25 — 365.2422 = .0078 days (about 11 minutes and 14 
seconds) too long. After 128 years, a full day was added to the calendar; that 
is, the Julian calendar overcounted the number of days. In the year 1582, the 
vernal equinox (the Spring day on which there are exactly 12 hours of daylight 
and 12 hours of night) occurred on March 11 instead of on March 21. Pope 
Gregory XIII (and his scientific advisors) then installed the Gregorian calen- 
dar by erasing 10 days that year; the day after October 4, 1582 was October 
15, 1582. This caused confusion and fear among the people; they thought their 
lives had been shortened by ten days. 

The Gregorian calendar modified the Julian calendar as follows. Call a 
year y ending in 00 a century year. If a year y is not a century year, then 
it is a leap year if it is divisible by 4; if y is a century year, it is a leap year only 
if it is divisible by 400. For example, 1900 is not a leap year, but 2000 is a leap 
year. The Gregorian calendar is the one in common use today, but it was not 
uniformly adopted throughout Europe. For example, the British empire didn’t 
accept it until 1752, when 11 days were erased, and the Russians didn’t ac- 
cept it until 1918, when 13 days were erased (thus, Trotsky called the Russian 
revolution, which occurred in 1917, the October Revolution, even though it 
occurred in November of the Gregorian calendar). 

The true number of days in 400 years is about 

400 x 365.2422 = 146096.88 days. 

In this period, the Julian calendar has 

400 x 365 + 100 = 146, 100 days, 

while the Gregorian calendar, which eliminates three leap years from this time 
period, has 146,097 days. Thus, the Julian calendar gains about 3.12 days every 
400 years, while the Gregorian calendar gains only 0.12 days (about 2 hours 
and 53 minutes. 


Historical Note. There are 1628 years from 46 BCE to 1582 CE. The Julian 
calendar overcounts one day every 128 years, and so it overcounted 12 days in 
this period (for 12 x 128 = 1536). Why didn’t Gregory have to erase 12 days? 
The Council of Nicaea, meeting in the year 325 CE, defined Easter as the first 
Sunday strictly after the Paschal full moon, which is the first full moon on or 
after the vernal equinox (now you know why Pope Gregory was interested in 
the calendar). The vernal equinox in 325 CE fell on March 21, and the Synod 
of Whitby, in 664 CE, officially defined the vernal equinox to be March 2 1 . The 
discrepancy observed in 1582 was thus the result of only 1257 = 1582 — 325 
years of the Julian calendar: approximately 10 days. 


We now seek a calendar formula. For easier calculation, choose 0000 as our 
reference year, even though there was no year zero! Assign a number to each 
day of the week, according to the scheme 


Sun 

Mon 

Tues 

Wed 

Thurs 

Fri 

Sat 

0 

1 

2 

3 

4 

5 

6 
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In particular, March 1, 0000, has some number ao, where 0 < cio < 6. In the 
next year 0001, March 1 has number cio + 1 (mod 7), for 365 days have elapsed 
from March 1, 0000, to March 1, 0001, and 

365 = 52 x7+1= 1 mod 7. 

Similarly, March 1, 0002, has number ao + 2, and March 1, 0003, has number 
ao + 3. However, March 1, 0004, has number ao + 5, for February 29, 0004, 
fell between March 1, 0003, and March 1, 0004, and so 366 = 2 mod 7 days 
had elapsed since the previous March 1 . We see, therefore, that every common 
year adds 1 to the previous number for March 1, while each leap year adds 2. 
Thus, if March 1, 0000, has number ao, then the number a' of March 1, year y, 
is 

a' = ao + y + L mod 7, 

where L is the number of leap years from year 0001 to year y. To compute L, 
count all those years divisible by 4, then throw away all the century years, and 
then put back those century years that are leap years. Thus, 

L = L V/4J - L.v/lOOJ + L.V/400J. 

where [x\ denotes the greatest integer in x. Therefore, we have 

a' = ao + y + L 

= ao + y + Lt/ 4 J - L.V/100J + |_j/400J mod 7. 

We can actually find ao by looking at a calendar. Since March 1, 2012, fell 
on a Thursday, 

4 = a 0 + 2012 + [2012/4J - |2012/100J + |2012/400J 
= a o + 2012 + 503-20+ 5 mod 7, 


and so 

a 0 = -2496 = -4 = 3 mod 7 

(that is, March 1, 0000 fell on Wednesday). We can now determine the day of 
the week a' on which March 1 will fall in any year y > 0, for 

a' = 3 + y + L V/4J - L v/100J + |+/400J mod 7. 


Historical Note. There is a reason we have been discussing March 1, for 
it was the first day of the year in the old Roman calendar (753 BCE). There 
were only ten months: Martius, . . . , Iunius, Quintilis, Sextilis, Septembris, 
. . . , Decembris (which explains why September is so named; originally, it was 
month 7). In 713 BCE, Numa added January and February, and the Julian cal- 
endar changed the names of Quintilis and Sextilis to July and August. 


Let us now analyze February 28. For example, suppose that February 28, 
1600, has number b. As 1600 is a leap year, February 29, 1600, occurs between 
February 28, 1600, and February 28, 1601; hence, 366 days have elapsed be- 
tween these two February 28s, so that February 28, 1601, has number b + 2. 
February 28, 1602, has number b + 3, February 28, 1603, has number b + 4, 
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February 28, 1604, has number b + 5, but February 28, 1605, has number b + 7 
( for there was a February 29 in 1604). 

Let us compare the pattern of behavior of February 28, 1600, namely, b , 
b + 2,b + 3,b+4,b + 5, b + 7 , with that of some date in 1599. If May 26, 

1599, has number c, then May 26, 1600, has number c + 2, for February 29, 

1600, comes between these two May 26s, and so there are 366 = 2 mod 7 in- 
tervening days. The numbers of the next few May 26s, beginning with May 26, 

1601, are c + 3, c + 4, c + 5, c + 7. We see that the pattern of the days for 
February 28, starting in 1600, is exactly the same as the pattern of the days 
for May 26, starting in 1599; indeed, the same is true for any date in January 
or February. Thus, the pattern of the days for any date in January or February 
of a year y is the same as the pattern for a date occurring in the preceding 
year y — 1 : a year preceding a leap year adds 2 to the number for such a date, 
whereas all other years add 1 . Therefore, we pretend we have reverted to the 
ancient calendar by making New Year’s Day fall on March 1, so that any date 
in January or February is treated as if it had occurred in the previous year. 


Historical Note. George Washington’s birthday, in the Gregorian calendar, 
is February 22, 1732. But the Gregorian calendar was not introduced in the 
British colonies until 1752. Thus, his original birthday was February 11. But 
New Year’s Day was also changed; before 1752, England and its colonies cele- 
brated New Year’s Day on March 25; hence, February, which had been in 1731, 
was regarded, after the calendar change, as being in 1732. George Washington 
used to joke that not only did his birthday change, but so did his birth year. See 
Exercise 4.80 on page 176. 


How do we find the day corresponding to a date other than March 1 ? Since 
March 1, 0000, has number 3 (as we have seen above), April 1, 0000, has 
number 6, for March has 31 days and 3 + 31 = 6 mod 7. Since April has 30 
days, May 1, 0000, has number 6 + 30 = 1 mod 7. Figure 4.4 is the table 
giving the number of the first day of each month in year 0000. 

Remember that we are pretending that March is month 1, April is month 2, 
and so on. Let us denote these numbers by 1 + y (/«), where j(m), for m = 
1, 2, . . . , 12, is defined by 

j(m) : 2,5, 0,3, 5, 1, 4, 6, 2, 4, 0, 3. 

It follows that month m, day 1, year y, has number 

1 + j (m ) + g(y) mod 7, 

where 


g(y) = y+ L y/4J - L.v/iooj + Lv/400j. 

Note that <7o = 1 + 7 ( 1 ), so that the values of j (m) depend on our knowing ciq. 
Here’s a formula for j (m): 

j(m ) = [2. 6w — 0.2J, where 1 < m < 12; 

the values are displayed in Figure 4.4. This formula is not quite accurate. For 
example, this number for December, that is, for m = 10, is [2.6/w— 0.2J = 25; 
but y (10) = 4. However, 25 = 4 mod 7, and so the formula for j(m ) really 
gives the congruence class mod 7. 
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Date 

Number 

Date 

Number 

Date 

Number 

March 1 

2 

July 1 

5 

November 1 

2 

April 1 

5 

August 1 

1 

December 1 

4 

May 1 

0 

September 1 

4 

January 1 

0 

June 1 

3 

October 1 

6 

February 1 

3 


Figure 4.4. Values of j(m). 


Theorem 4.49 (Calendar Formula). The date with month m, day d, year y 
has number 


d + j(m ) + g(v) mod 7, 
where j(m ) is given in Figure 4.4, 

g(y ) = y + L j/ 4J - L.v/iooj + Lv/400j, 

and dates in January and February are treated as having occurred in the pre- 
vious year. 

Proof. The number mod 7 corresponding to month in. day 1, year y, is 

1 + j(m) + g{y). 

It follows that 2 + j(m) + g(v) corresponds to month m, day 2, year y, and, 
more generally, cl + j(m) + g(y) corresponds to month m. day cl , year y. ■ 

Let’s find the day of the week on which July 4, 1776 fell; here m = 5, 
cl = 4, and y = 1776. Substituting in the formula, we obtain the number 

4 + 5 + 1776 + 444 - 17 + 4 = 2216 = 4 mod 7; 

therefore, July 4, 1776, fell on a Thursday. 

Example 4.50. Does every year y contain a Friday 13? We have 
5=13 + j(m) + g(y) mod 7. 

The question is answered positively if the numbers j(m), as m varies from 1 
through 12, give all the remainders 0 through 6 mod 7. And this is what hap- 
pens. The sequence of remainders mod 7 is 

2, 5, [0, 3, 5, 1. 4, 6, 2], 4, 0, 3. 

Indeed, we see that there must be a Friday 13 occurring between May and 
November. No number occurs three times on the list, but it is possible that 
there are three Friday 13s in a year because January and February are viewed as 
having occurred in the previous year; for example, there were three Friday 13s 
in 1987 (see Exercise 4.79 on page 176). Of course, we may replace Friday by 
any other day of the week, and we may replace 1 3 by any number between 1 
and 28. ▲ 


The word calendar comes 
from the Greek “to call,” 
which evolved into the 
Latin word for the first day 
of a month (when accounts 
were due). 
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Most of us need paper and pencil (or a calculator) to use the calendar for- 
mula in the theorem, but here’s a way to simplify the formula so you can do 
the calculation in your head and amaze your friends. A mnemonic for j (in) is 
the sentence 

My Uncle Charles has eaten a cold supper; he eats nothing hot. 

2 5 (7 = 0) 3 5 14 6 2 4 (7 = 0) 3 

Corollary 4.51. The date with month in, day d, year y = 100C + N, where 
0 < N < 99, has number 

d + j(m) + N + \_N/A\ + |_C/4J — 2C mod 7, 

provided that dates in January and February are treated as having occurred in 
the previous year. 

Proof. If we write a year y = 100C + N, where 0 < N < 99, then 

y = 100C + N = 2C + N mod 7, 

Ly/4J = 25 C + [N/4\ =4 C + (N/4\ mod 7, 

Ly/100J = C, and |_y/400J = |C/4J. 


Therefore, 

y + L y/4J - L v/lOOJ + Ly/400] = N + 5C + [N/4\ + |_C/4J mod 7 

= N + L7V/4J + LC/4J - 2C mod 7. ■ 

This formula is simpler than the first one. For example, the number corre- 
sponding to July 4, 1776 is now obtained as 

4 + 5 + 76 + 19+ 4- 34 = 74 = 4 mod 7, 

agreeing with our calculation above. The reader may now compute the day of 
his or her birth. 


January is counted as 
belonging to the previous 
year 1 908. 

A = 1+0 + 8+ L8/4J + L19/4J - 38 
= —23 mod 7 
= 5 mod 7. 

Rose was born on a Friday. ▲ 

J. H. Conway found an even simpler calendar formula. The day of the week 
on which the last day of February occurs is called the doomsday of the year. 
We can compute doomsdays using Corollary 4.51. 

Knowing the doomsday D of a century year 100C finds the doomsday />' 
of any other year y = 100C + /V in that century. Since 100C is a century 


Example 4.52. The birthday of Rose, the grandmother of Danny and Ella, was 
January 1, 1909; on what day of the week was she born? 

We use Corollary 4.51. If A is the number of the day, then j(m) = 0 (for 
January corresponds to month 1 1), and 
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year, the number of leap years from 100C to y does not involve the Gregorian 
alteration. Thus, 


D' = D + N + [N/4\ mod 7. 

For example, since doomsday 1900 is Wednesday = 3, we see that doomsday 
1994 is Monday = 1, for 

3 + 94 + 23 = 120 = 1 mod 7. 


February 29, 1600 

2 

Tuesday 

February 28, 1700 

0 

Sunday 

February 28, 1800 

5 

Friday 

February 28, 1900 

3 

Wednesday 

February 29, 2000 

2 

Tuesday 


Figure 4.5. Recent doomsdays. 


Proposition 4.53 (Conway). Let D be doomsday 100C, and let 0 < N < 99. 
If N = 12 q + r, where 0 < r < 12, then D’ , doomsday 100C + N, is given 
by 


D + q + r + \r/4\ mod 7. 


Proof. 


D' = D + N + [N/4\ 

= D + I2q + r + \_{12 q + r)/4J 
= D + 15 q + r + L r / 4J 
= D + q + r+[r/ 4J mod 7. ■ 

For example, what is D' = doomsday 1994? Now N = 94 = 12x7+10, 
so that q = 1 and r = 10. Thus, D' = 3+ 7+10 + 2= 1 mod 7; that is, 
doomsday 1994 is Monday, as we saw above. 

Once we know doomsday of a particular year, we can use various tricks 
(e.g., Uncle Charles) to pass from doomsday to any other day in the year. Con- 
way observed that some other dates falling on the same day of the week as the 
doomsday are 

April 4, June 6, August 8, October 10, December 12, 

May 9, July 11, September 5, and November 7. 

If we return to the everyday listing beginning with January as the first month, 
then it is easier to remember these dates using the notation month/day: 


4/4, 6/6, 8/8, 10/10, 12/12, 

5/9, 7/11, 9/5, 11/7. 
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Since doomsday corresponds to the last day of February, we are now within 
a few weeks of any date in the year, and we can easily interpolate to find the 
desired day. For example, let’s use this method for July 4, 1776. Notice that 
July 4 occurs on the same day of the week as July 11, and so we need only find 
doomsday 1776. By Proposition 4.53, 

D' = 0 + 16+ L76/4J = 95 = 4 mod 7. 

We see again that July 4, 1776 fell on a Thursday. 

Example 4.54. Let’s use Conway’s method to compute Rose’s birthday again 
(recall Example 4.52: Rose was born on January 1, 1909). Since Conway’s 
method applies within a given century, there is no need to pretend that Jan- 
uary and February live in the preceding year; we can work within 1909. Now 
doomsday 1900 is 3, so that Proposition 4.53 gives doomsday 1909 = 0; that 
is, Sunday. By definition, doomsday is the number corresponding to the last 
date in February, which is here February 28 (for 1909 is not a leap year). Thus, 
we interpolate that 3 is the number for 1/31, 1/24, 1/3; that is, January 3 fell 
on Sunday, and so January 1 fell on Friday (which agrees with what we saw in 
Example 4.52). ▲ 


Exercises 

4.76 A suspect said that he had spent the Easter holiday April 21, 1893, with his ailing 
mother; Sherlock Holmes challenged his veracity at once. How could the great 
detective have been so certain? 

Hint. Easter always falls on Sunday. (There is a Jewish variation of this problem, 
for Yom Kippur must fall on either Monday, Wednesday, Thursday, or Saturday; 
secular variants can involve Thanksgiving Day. which always falls on a Thursday, 
or Election Day in the US, which always falls on a Tuesday.) 

4.77 How many times in 1900 did the first day of a month fall on a Tuesday? 

Hint. The year y = 1 900 was not a leap year. 

4.78 On what day of the week did February 29, 1896 fall? 

Hint. On what day did March 1, 1896, fall? Conclude from your method of solu- 
tion that no extra fuss is needed to find leap days. 

4.79 * 

(i) Show that 1987 had three Friday 13s. 

Hint. See Example 4.50. 

(ii) Show, for any year y > 0, that g(y) — g(v — 1) = 1 or 2, where g(v) = 
V + L.V/4J- b/iooj + L.r/400j'. 

(iii) Can there be a year with exactly one Friday 13? 

Hint. Either use congruences or scan the 14 possible calendars: there are 7 
possible common years and 7 possible leap years, for January 1 can fall on 
any of the 7 days of the week. 

4.80 * JJR’s Uncle Ben was bom in Pogrebishte, a village near Kiev, and he claimed 
that his birthday was February 29, 1900. JJR told him that this could not be, for 
1900 was not a leap year. Why was JJR wrong? 

Hint. When did Russia adopt the Gregorian calendar? 
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4.5 Connections: Patterns in 
Decimal Expansions 

One of the most beautiful applications of modular arithmetic is to the classi- 
fication of decimal expansions of rational numbers, a circle of ideas that runs 
throughout precollege mathematics. 

We now ask what we can infer from knowing the decimal expansion of a 
real number x. You probably know Proposition 4.58: x is rational if and only 
if its decimal expansion either terminates or eventually repeats. Is there any 
nice way to tell ahead of time which fractions terminate? Can you be sure 
that the ones that don’t terminate really do repeat? And, for fractions whose 
decimals repeat, can you predict the period (the number of digits in its block, 
the repeating part) as well as the actual sequence of digits in it? 

Many conjectures about fractions and decimals come from a careful anal- 
ysis of numerical calculations. In this section, we (and you) will perform a 
great many calculations, looking at patterns you’ll observe, with the goal of 
analyzing them, and seeing how they are explained by “how the calculation 
goes.” 

Real Numbers 

We assume that every real number x has a decimal expansion; for example, 

— jt = —3.14159 This follows from identifying each real number x with 

a “point on a number line” having signed distance from a fixed origin on a 
coordinatized line. In particular, rational numbers have decimal expansions, 
which you can find by long division. 

The term expansion will be used in a nonstandard way: we restrict the ter- 
minology so that, from now on, the decimal expansion of a real number is the 
sequence of digits after the decimal point. With this usage, for example, the 

decimal expansion of —tz is .14159 

We are going to see that decimal expansions of real numbers are unique, 
with one possible exception: if there is an infinite string of all 9s. For example, 

.328 = .327999.... 

This is explained using the geometric series. 

Lemma 4.55. Ifr is a real number with \r\ < 1, then 

oo 

y r n = 1 + r + r 2 = . 

1 - r 


Proof For every positive integer n, the identity 

1 - r n = (1 - r) (1 + r + r 2 -\ b r" -1 ) 


gives the equation 


1 + r + r 2 -\ hr" 1 


1 -r n 
1 — r 


1 

1 — r 


1 — r 


for every real number r ^ 1. Since | r | < 1, we have lim r n / (1 — r) = 0. ■ 

n — >oo 
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For example, taking r = 1/10, we have 


Hence, 


.999... = — + 


10 2 + 10 3 
1 

1 + — + 


+ ... 


1 


+ 


1 


10 10 2 10 3 


+ 


1 


1 - 1/10 
i° = l. 

9 


.327999 . . . = .327 + .000999 . . . = .327 + -^.999 . . . 

10- 3 

= .327 +- -^ = .327 + .001 = .328. 

10- 3 

We’ll resolve this ambiguity by choosing, once for all, to avoid infinite strings 
of 9s. Indeed, we’ll soon see that the choice .328 comes from long division. 

If we disregard “all nines from some point on,” then we can show that every 
real number has a unique decimal expansion. For this, we need the following 
corollary to Lemma 4.55. 


Corollary 4.56. If x = ,d\ <7 2 . . . and dj < 9 for some j > 1, then 

d\ + 1 


x < 


10 


Proof Each digit dj is at most 9, and there is some j > 1 with dj strictly less 
than 9. So, writing x as a series, we have 

d\ d 2 d 3 dj 

x= 10 + 10 ^ + 10 ^ + "' + 107 + "' 
d\ d.2 d?, 9 

< To + io 2 + io 3+ ”’ + To^ + ”’ 

di 9 9 9 

“To + io 2 + io 3+ ”’ + To^ + ”’ 

d x 9 / 1 1 

“ 10 + lo 4 1+ 10 + "‘ + 10 ^ 2+ ' 


10 v 10 2 A 

d\ 1 d i + l 
“ 10 + 10 “ 10 


Proposition 4.57. Every real number x has a unique decimal expansion that 
does not end with infinitely many consecutive 9s. 


Proof. Suppose that 


,d\ . . . dk ■ ■ ■ = x = .ei .. .e/c , 

where d\ = e\, . . . , dk - 1 = e^-i, but that dk We may assume that 

dk < e k. so that dk + 1 < ejt. 
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Multiplying by a power of 10, we see that 

■dkdk+ 1 . . . = • ekek+i — 


Because there’s not an infinite string of 9s in our expansions, we can apply 
Corollary 4.56 to find that 


■dkdk+l • • • < - 


dk + 1 
10 


ek 

< — < 

~ 10 - 


•OcOc+1 


contradicting the fact that the extreme left-hand and right-hand expressions are 
equal. ■ 


Decimal Expansions of Rationals 

Let’s now focus on rational numbers. Some decimal expansions of rationals 
terminate; for example, 


1 

10 






— = .025. 
40 


And there are some fractions whose decimal expansions repeat (after a possible 
initial string of digits): 


1 

3 


.333..., - = .142857142857..., 

7 


9 

28 


.32142857142857... 


Definition. Let a real number x have decimal expansion 


r = .did2di . . . ; 


You can think of 
.32142857142857. ..as 
the 10-adic expansion 
of 9/28, using negative 
powers of 10. 


that is, x = k.did 2 di ■ ■ ■ = k + r for some k e Z. 

(i) We say that x terminates if there exists an integer N so that dj = 0 for 
all / > N. 

(ii) We say that x repeats with period m > 1 if 

(a) it doesn’t terminate 

(b) there exist positive integers N and m so that d\ = di+ m for all i > N 

(c) m is the smallest such integer. 

If x repeats, then its block is the first occurrence of its repeating part 

didi + i . . . d( -\-m — t • 


We could say that “terminating” and “repeating” decimals are not really 
different, for terminating rationals have decimal expansions that repeat with 
period 1 and with block having the single digit 0, but it’s convenient and natural 
to distinguish such rationals from those having infinitely many nonzero digits, 
as you’ll see in Proposition 4.59. 

The way to get the decimal expansion for 1/7 is to divide 7 into 1 via long 
division, as in Figure 4.6. Each of the remainders 1 through 6 shows up exactly 
once in this calculation, in the order 3, 2, 6, 4, 5, 1. Once you get a remainder 
of 1, the process will start over again, and the digits in the quotient, namely, 
1, 4, 2, 8, 5, 7, will repeat. The block of 1/7 is 142857. However, even though 
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0.142857... 

7)l.000000... 

7 

30 

28 

20 

14 

60 

56 

40 

35 

50 

49 

1 


!3) 


0.153846... 

2 . 000000 . . . 
J_3 
70 
65 
50 
39 
110 
104 
60 
52 
80 
78 
2 


Figure 4.6. 1/7 = .142857142857142857. . . . Figure 4.7. 2/13 = .153846. . . . 


142857142857 also repeats, it is not a block because it is too long: 1/7 has 
period 6, not 12. 

Consider a second example: the calculation of 2/13 in Figure 4.7. It too has 
period 6. 

Next, we’ll see that every rational number terminates or repeats; that is, the 
two types in the definition are the only possibilities. 


Proposition 4.58. A real number x is rational if and only if it either terminates 
or repeats. Moreover ifx = a/b is rational, then it has period at most b. 


Middle school students 
practice another method 
for doing this (for days on 
end). See Exercise 4.84 on 
page 181. 


Proof The arguments for 1/7 and 2/13 generalize. Imagine expressing a frac- 
tion a/b (with a, b > 0) as a decimal by dividing b into a via long division. 
There are at most b possible remainders in this process (integers between 0 
and b — 1), so after at most b steps a remainder appears that has shown up 
before. After that, the process repeats. 

Conversely, let’s see that if a real number x terminates or repeats, then x is 
rational. A terminating decimal is just a fraction whose denominator is a power 
of 10, while a repeating decimal is made up of such a fraction plus the sum of 
a convergent geometric series. An example is sufficient to see what’s going on. 


.1323232... = .1 + .0323232... 

32 32 32 

“ + To 1 + To 5 + To 7 + 


.1 + 



l + W + i^ + -" 


(by Lemma 4.55) 


The last expression is clearly a rational number. The general proof is a generic 
version of this idea; it is left as Exercise 4.8 1 below. ■ 
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Which rationals terminate? Certainly, any rational a/b whose denomina- 
tor b is a power of 10 terminates. But some other rationals can also be put in 
this form; for example. 


5 5-125 625 

— = = = .625. 

8 8-125 1000 

The basic idea is to take a denominator of the form 2“5' ; , and multiply top and 
bottom of the fraction to produce a denominator that’s a power of 10. 

Proposition 4.59. Let x = a/b be rational , written in lowest terms. Then x 
terminates if and only if the only prime factors ofb are 2 and 5. 

Proof If x terminates, say, x = k.d\d 2 . . . d m , then x = k + D/I0 m for some 
k e Z, where D is the integer with digits d\d 2 ■ ■ ■ d m \ thus, x is a fraction 
whose denominator is divisible only by 2 and 5. Conversely, if x = k + r = 
k + a/2 u 5 v , then 


2^5“ \ / a \_2 v 5 u a 
2 V 5 U ) V2 1< 5’ ; / _ 10“+" 

a fraction whose denominator is a power of 10. Hence, x terminates. ■ 



How to Think About It. Exercise 4.83 shows that if r is a rational number 
and 5 m r or 2 l r terminates, then r also terminates. However, if kr terminates 
(for some integer k ), then r need not terminate; for example, r = .271333 . . . 
does not terminate, but 3 r = .814 does terminate. 


Exercises 

4.81 * Complete the proof of Proposition 4.58 that a decimal that eventually repeats is 
the decimal expansion of a rational number. 

4.82 * Let r = a/b be rational. 

(i) If r terminates, then kr terminates for every integer k. 

(ii) If gcd(a , b) = 1, prove that a /b terminates if and only if 1 /b terminates. 
Hint, l/b = ( sa + tb~)/b = sa/b + t. 

4.83 * If l > 0, m > 0, and 2^5 m r terminates, prove that r terminates. 

4.84 * Here’s a method used by many precollege texts for converting repeating dec- 
imals to fractions. Suppose that you want to convert .324324. . . to a fraction. 
Calculate like this: If x = .324324. . ., then lOOOx = 324.324324. . . , and 

1000* - * = 999* = 324. 


Hence,* = 324/999. 

(i) There is a hidden assumption about geometric series in this method. Where is 
it? 

(ii) Try this method with the following decimal expansions: 

(a) .356356... (b) .5353 . . . 

(c) .2222 ... (d) .07593 . . . 

(e) .0123563563... 


Theorem 4.61 below gives 
a necessary and sufficient 
condition for a/b to repeat. 
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4.85 What’s wrong with the following calculation? Let .r = 1 + 2 + 2 2 + ... . Then 

x = 1 + 2 + 2 2 + . . . 

2.r = 2 + 2 2 + 2 3 + . . . . 


Subtract the top equation from the bottom to obtain x = —l. 

4.86 Calculate decimal expansions for the followings fractions using long division. For 
each one. what other fractions-to-decimal expansions (if any) do you get for free? 


® 3 

(11) 6 

(iii) 

1 

4 


(iv) T7 

(v) 77 

(vi) 

15 

15 


1 

1 


(vii) - 
8 

(viii) — 

13 

(ix) 

1 

1 


(x) — 

19 

(xi) — 

31 

(xii) 


Corollary 4.62 says that 
the period of a/n is equal 
to the period of 1 /n if 

gcd (a,n) = 1. 


Periods and Blocks 

What is the period of a “unit fraction” 1 /n? Our result will come from taking a 
closer look at how decimal expansions are calculated; the analysis generalizes 
to the decimal expansion of any rational number. 


0.076923. . . 

13)l.000000... 
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90 
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0.153846... 
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J_3 
70 
65 
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110 
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80 
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Fig U re 4 . 8 . Decimal expansions of 1 / 1 3 and 2/13. 


An analysis of the calculation for 1/13 yields another insight (see Figure 
4.8). Pretend that the decimal point isn’t there, so we are dividing 1 ,000,000 = 
10 6 by 13. Since 1 appears as a remainder, the initial sequence of remainders 
will repeat, and the period of 1/13 is 6. Thus, the period of 1/13 is the smallest 
power of 10 congruent to 1 mod 13. In other words, the period of 1/13 is the 
order of 10 in Z13 (see Exercise 4.49 on page 165). 

We will generalize this observation in Theorem 4.61: the period of any frac- 
tion \/n is the order of 10 in Z„ as long as there is some positive integer e with 
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10 e = 1 mod n. But, by Exercise 4.48 on page 165, some power of 10 is con- 
gruent to 1 mod n if and only if 10 is a unit in Z„. Now this condition is 
equivalent to gcd(10, n) = 1; that is, if and only if n is not of the form 2 U 5 V . 
Thus, Proposition 4.59 shows why the dichotomy of terminating and repeating 
rationals is so natural. 

To prove the general result for l/n, we just need to make sure that the first 
remainder that shows up twice is, in fact, 1. That’s the content of the next 
lemma. 

Lemma 4.60. 7/gcd(10, n) = 1, then 1 occurs as a remainder in the long di- 
vision of l by n; moreover, there cannot be two identical remainders occurring 
before 1 occurs. 

Proof As we saw above, because gcd(10, n) = 1, a remainder of 1 will first 
appear in the long division after e steps where e is the order of 10 in Z„. 
We must prove that there is no repeat of some other remainder before that 
remainder of 1 shows up. First of all, there can’t be an earlier 1 (why?). Next, 
suppose you see the same remainder, say c, occurring earlier, say, at steps 
e\ < C 2 < e. Then we’d have 


10 ei = 10 e2 = c mod n . 


Since 10 is a unit in Z„, this would imply that 

1 = 10 e2_ei mod n. 

And since e 2 — e\ < e, this would contradict the fact that e is the order of 10 
in Z m . ■ 

Putting it all together, we have a refinement of Propositions 4.58 and 4.59: 

Theorem 4.61. Ifn > 0 is an integer, then I / n either terminates or repeats. 

(i) 1 / n terminates if and only ifn = 2 U 5 V for nonnegative integers u and v. 

(ii) Ifgcd(n, 10) = 1, then l/n repeats with period m, where m is the order 
of 10 in Z„. 

Proof Part (i) was proved in Proposition 4.59. The essence of the proof of 
part (ii) lies in the discussion on page 182 about the decimal expansion of 1/13: 
the expansion for \/n repeats after e steps, where e is the order of 10 in Z„; 
that is, the first occurrence of remainder 1 occurs at the eth step of the long 
division. And Lemma 4.60 shows that there can be no earlier occurrences. ■ 

So, if gcd(«, 10) = 1, then l/n repeats, and we know that its period is the 
order of 10 in Z„. What about fractions of the form a j n ? The next corollary 
shows that the same thing is true, as long as the fraction is in lowest terms. 

Corollary 4.62. If gcd(a.n) = 1 and a < n, then the period of a/n is the 
same as that of l/n, namely the order of 10 in Z„. 

Proof Suppose the period of a/n is l. Then, arguing as in Lemma 4.60, the 
expansion will repeat only after the remainder a occurs in the long division 
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Figure 4.9. Periods of 1 / p for small primes p. 


of a by n (see Exercise 4.88 on page 190). But this implies that l is the smallest 
positive integer such that 

a 10 f = a mod n. 

Since a is a unit in Z„ , multiplying by a~ x gives 

10^=1 mod 77 . 

It follows that l = m, the order of 10 in Z„ . ■ 

Theorem 4.61 doesn’t answer every question about the periods of 1 / p, 
where p is a prime other than 2 or 5. Sometimes the period is p — 1, as when 
p = 7, but this not always so, for 1/13 has period 6, not 12. In all the entries in 
Figure 4.9, we see that periods of 1 / p are divisors of p — 1. This turns out to 
be always true, and you’ll prove it soon. What about non-prime denominators? 
Perhaps the length of the period of the expansion of 1 /n is a factor of n — 1 ? 
But stay tuned — we’ll No such luck: 1/21 = .047619047619 . . . has period 6 which is not a divisor 

return to the period of 1/t 7 of 20. 
shortly. 

Historical Note. In Disquisitiones Arithmeticae [14], Gauss conjectured 
that there are infinitely many primes p that have the property that the deci- 
mal expansion for 1 / p has period p — 1. Gauss’s conjecture can be restated as 
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follows: there are infinitely many primes p for which the order of 10 in Z p is 
p — 1. E. Artin generalized Gauss’s conjecture. He claimed that if b is a posi- 
tive integer that is not a perfect square, then there are infinitely many primes p 
for which the Zr-adic expansion of 1 / p has period p — 1. These are still con- 
jectures (as Gauss’s conjecture above), and very celebrated ones at that. Many 
seemingly simple questions in arithmetic are extremely hard to answer. 

But some things are known. For example. Gauss proved in Disquisition.es 
that for any prime p , there is always at least one number (not necessarily 10) 
whose order in Z p is p — 1. Such a number is called a primitive root mod p. 


We now know that the period of 1/n, where gcd(10, n) = 1, is the order of 
10 in Z„ . In Exercise 4.49 on page 165, you did some calculations of orders of 
units. We can now say a little more. 

Theorem 4.63. Ifu is a unit in Z„, then 

u^ n) = 1 


where f is the Euler f function. 


Proof. By Proposition 4.39, there are <p(n) units in Z„. Suppose we list them 

all: 


U i , 1/2, • • • , t/0(fl) . 

One of these units is u. Now multiply all these units by w; you get 

MM j , MM 2 ***5 • 

All these elements are units (Exercise 4.44 on page 165), and they are distinct 
(Exercise 4.89 on page 190). This means that the second list contains all the 
units, perhaps in a different order (they are distinct units, and there are fin ) 
of them). Now multiply all the units together, first using the original order, and 
then using the permuted order: 

0(«) 0(n) 0(n) 

n m, = n u u ' = u<t>(n) n u >- 

i = 1 i=l i=l 

But Util Ui is a unit (Exercise 4.44 again), so you can divide both sides by 

it, and the result follows. ■ 

Corollary 4.64. The order of a unit in Z„ is a factor of fin). 

Proof. Suppose that u is a unit in Z„ with order e. 

Divide fin) by e to get a quotient and remainder: 

fin) = qe + r 0 < r < e. 


Then 

u* (n) = u qe+r = {u e ) q u r . 

Now use Theorem 4.63 and the fact that e is the minimal positive exponent 
such that u e = 1 to conclude that r = 0. ■ 


See Exercises 4.92 
and 4.93 on page 190. 
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Specializing to u = 10, we have 

Theorem 4.65. Ifn is relatively prime to 10 ( that is, if I / n repeats), then the 
period of\/n is a divisor of<p(n). 

This greatly reduces the number of possibilities. For example, all we could 
say about the length of the period of 1 / 23 1 before is that it is at most 230. Now 
we can say it is a factor of 0(231) = 120. Which one is it? 

Example 4.66. We saw earlier, on page 184, that 1/21 = .047619047619..., 
so that the period of 1/21 is 6. Now 0(21) = 12 and, of course, 6 is a divisor 
of 12. ▲ 

Proposition 4.63 gives us an added bonus: another proof of Fermat’s Little 
Theorem. 

Corollary 4.67. If p is a prime, then a p = a in Z p for all integers a. 

Proof As in the proof of Theorem 4.9, we have two cases. If p \ a, then 

a p = a = 0 mod p. 


and a p = a in Z p . 

If gcd(«. p) — 1, then a is a unit in Z p , and Proposition 4.63 gives 

a Hp) = 1 

in Z p . But 0 ip) = P ~ 1, because p is prime. Hence, 

a p ~ x = 1. 

Multiplying both sides by a gives a p = a in Z p . ■ 

As another application, we know that the period of 1 / n is at most n — 1 . 
When is it as large as possible? 

Corollary 4.68. If the period ofl/n is n — 1, then n is prime. 

Proof. If n is not prime, then (pin) < n — 1, and the period of 1 / n is not 
n- 1. ■ 


How to Think About It. The converse of Corollary 4.68 is not true, as the 
example of 1/13 shows. As we said on page 184, it’s still an open question 
about which primes p have the property that the decimal expansion for 1/ p 
has maximal period. All we can say is that the decimal expansion is a divi- 
sor of (pip) = p — 1 , providing an explanation for the evidence gathered in 
Figure 4.9. 


We have discovered information about periods of repeating rationals; let’s 
now look a bit at their blocks. Before continuing, it’s worth working out some 
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other decimal expansions to look for interesting patterns. For example, calcu- 
late the decimal expansions of 

12 112 11 
8’ 3’ 15’ 19’ 19’ 13’ 20 

to see whether you can come up with some conjectures for connections be- 
tween the integers a and b and the blocks in the decimal expansion of a/b. 
Figure 4.10 displays the digits in the blocks of k / 7 for 1 < k < 6. Is there a 
way to explain where each rearrangement starts? 

1 

- = .142857142857... 

7 

2 

- = .285714285714... 

7 

3 

- = .428571428571 ... 

7 

4 

- = .571428571428... 

7 

5 

- = .714285714285... 

7 

6 

- = .857142857142... 

7 

Figure 4.10. The expansions of k/1 for 1 < k < 7. 

There are quite a few patterns here. For example, each block consists of six 
repeating digits — some “cyclic” permutation of 142857: 

142857, 285714, 428571, 571428, 714285, 857142 

It’s the sequence of remainders that explains the various decimal expansions 
of k/1 — what they are and why they are in a particular order. For example, in 
calculating 6/7, you look down the remainder list and see where you get a 6. 
The process for 6/7 will start there, as in Figure 4.1 1. 

The point of Figure 4. 1 1 is that you can “pick up” the calculation at any step 
in the process — in a way, the calculation of 6/7 is embedded in the calculation 
of 1/7. So are the calculations for all the other k /7 for 2 < k < 5. 

So, the sequence of remainders in a long division provides the key to which 
decimal expansions can be obtained from the same long division. For the ra- 
tionals k/1 , there were six remainders before things started to repeat, so we 
get all the expansions 1/7, 2/7 6/7 from one calculation. But it isn’t al- 

ways the case that you get all the expansions for k/n (where 1 <k<n) from 
the calculation of 1 /n. That only happens when the period for the decimal ex- 
pansion of 1 /n has the maximal length n — 1 (implying that n is prime). For 
example, for the various /f/13, you need two calculations, because the period 
of the expansion for 1/13 is 6, not 12. 

Earlier, on page 187, we listed the blocks for the various k/1, noting that 
there seemed to be no apparent pattern to where each block starts. In fact, a 
closer analysis of the long division gives us a way to calculate the digits in each 
block. Consider again the calculation of the expansion for 6/7. As before, if 
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0.142857... 0.857142... 



Figure 4.11. The expansion of 6/7 from that of 1/7. 


we “forget” the decimal point, each new remainder gives the remainder when 6 
times a power of 10 is divided by 7. Referring to Figure 4.1 1, we have 

6=6-1 mod 7 

4 = 6-10 mod 7 

5 = 6 • 100 mod 7 

1 = 6- 1000 mod 7 
3 = 6- 10000 mod 7 

2 = 6- 100000 mod 7 

6 = 6- 1000000 mod 7. 

Now, these are the remainders, not the digits in the block. Still, we have an 
interesting preliminary result. 

Lemma 4.69. Let 1 < a < n, and suppose that gcd( 1 0, n ) = gcd (a,n) = 1. 
Ife is the order of 10 in Z n , then the j th remainder in the long division calcu- 
lation ofa/n, where 0 < j < e, is the solution Cj of the congruence 

Cj = a • 10 7 mod n 


with 0 < Cj </7. 

Proof. Imagine dividing n into a with long division, up to j places. Suppose 
that the remainder is Cj : 


■qxqiqiqA- ■ ■ qj 
n\a. 0 0 0 0... 0 


Look at the example of 
2/13: 

2 = (13 x .1) + 10” 1 • 7 
2 = (13 X .15) + 10~ 2 • 5 
2 = (13 X .13) + 10 — 3 -11 
and so on. 


This says that 


Cj. 


a = (n x .qi c^q^q^ ... qj) + 10 7 cj. 
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Multiply both sides of the equation by 1 0 J to find 

alO 7 = (n x qi q 2 cj3 <74 ... qj) + cj . 

This says that 

Cj = a ■ 10 y mod n. ■ 

What about the digits in the blocks? As in Lemma 4.69, let gcd(10, n) = 
gcdfa, n) = 1 and e be the order of 10 in Z„. Then we know that the eth 
remainder is a, where 

a ■ 10 e = a mod n. (4.3) 

What is the block? Our old friend the Division Algorithm gives the answer: 

Theorem 4.70. Let gcd(10, n) = gcd(a, n) = 1. If e is the order of 10 in Z„, 
then the sequence of digits in the block of the decimal expansion for a/n is 

a( I0 e - 1) 
n 

Proof The above discussion shows that the block is the partial quotient up to 
a remainder of a in the division. Rewrite Eq. (4.3) as: 

a ■ 10 e = qn + a 

Solving for q, we have the desired result. ■ 

Example 4.71. For the various k/1, 

\j = .142857 . . . and 1(10 6 — l )/7 = 142857 

- = .285714... and 2(10 6 - l)/7 = 285714 

- = .428571... and 3(10 6 - l)/7 = 428571 

4 

-=.571428... and 4(10 6 — l)/7 = 571428 
^ = .714285... and 5(10 6 - l)/7 = 714285 

^ = .857142... and 6(10 6 - l)/7 = 857142. 

Figure 4.12. The blocks of k/1 for 1 < k < 7. 

It’s an interesting calculation to go through the same process for the vari- 
ous k/13. ▲ 
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Let ao = a. 


Exercises 


4.87 Find the order of 10 modulo n (if it exists) for each value of n, and verify that the 
decimal expansion of 1 jn has period equal to the order. 


(i) 7 

(ii) 9 

(iii) 3 

(iv) 6 

(v) 8 

(vi) 11 

(vii) 13 

(viii) 39 

(ix) 22 

(x) 41 

(xi) 73 

(xii) 79 

(xiii) 123 

(xiv) 71 

(xv) 61 


4.88 * Finish the proof of Corollary 4.62 by showing that the decimal expansion of 
1 /«, where 1 < a < n and gcd (a,n) = 1 . will repeat only after a remainder of a 
occurs in the long division of a by n . 

4.89 If 


L — {ui , M2. . . . . w 0(/l)} 

is the list of units in Z„ and u is any unit, show that the elements of 
uL = {uu i, uu 2, ... 

are all distinct. 

4.90 Theorem 4.70 says that if gcd( 10, n) = gcd(a, n) = 1 and e is the order of 10 in 
Z„, then the block in the decimal expansion of a/n is a( 10 e — 1 )/n. Why is this 
latter fraction an integer? 

4.91 Suppose that gcd( 10, n) = gcd (a,n) = 1, e is the order of 10 in Z„, and cj is the 
remainder when a • 10-' is divided by n. If the block of a/n is .a\d2 ■ ■ ■ a e , show, 
for 1 < j < e, that 

lOcy-i — Cj 


4.92 Just as there are ?>-adic expansions of integers, there are also such expansions 
for rational numbers. For example, if we are working in base 5, then 1/5 = .1, 
1/5 2 = .01, 1/5 3 = .001, and so on. Find rational numbers (written as a/b in 
the usual way) that are equal to each 5-adic expansion. 

(i) .2 (ii) .03 (iii) .1111... 

(iv) .171717... (v) .001001001... 

4.93 Find the 5-adic expansion of each rational number 

(i) f (ii) ^5 (iii) \ 

(i y ) Tq: ( v ) 75 (vi) 25 

4.94 Show that a positive rational number has a terminating 7>-adic expansion for some 
positive base b. 

4.95 (i) What is the decimal expansion of 1 /9801? 

(ii) What is the period of this expansion? 

Hint. 


10000 1 



4.96 (i) What is the decimal expansion of 1 /9899? 
(ii) What is the period of this expansion? 
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Abstract Algebra 


Why do mathematics? The answer is simple: we want to understand a corner 
of our universe. But we are surrounded by so many different things that it 
makes sense to organize and classify, thereby imposing some order. Naturally, 
we draw on our experience, so we can decide what we think is important and 
what is less interesting. 

Numbers and calculations have been very useful for thousands of years, and 
we have chosen to study them. In particular, we have seen that certain arith- 
metic and geometric ideas help us understand how numbers behave. Some- 
times the connections are quite surprising: for example, the relation between 
Pythagorean triples and the method of Diophantus. We have also developed 
several tools to facilitate our work: an efficient notation and mathematical in- 
duction; the complex numbers and congruences have also enhanced our view. 

There are unexpected consequences. As we investigate, we find that even 
when we find a satisfying answer, new, interesting questions arise. Even though 
the method of Diophantus explains almost every question we might have about 
Pythagorean triples, it also suggests that we replace the unit circle by other 
conic sections, thereby giving insight into some calculus. 

It is now time to organize the number theory we have studied. The main 
idea is to abstract common features of integers, rational numbers, complex 
numbers, and congruences, as we did when we introduced the definition of 
commutative ring. This will further our understanding of number theory itself 
as well as other important topics, such as polynomials. 

This chapter continues this adventure. In Section 5.1, we study domains , 
an important class of commutative rings. In Section 5.2, we study polynomi- 
als, one of the most important examples of commutative rings. We will show, 
in particular, that any commutative ring can serve as coefficients in a ring of 
polynomials. Section 5.3 introduces homomorphisms, which allows us to com- 
pare and contrast commutative rings, as well as to make precise the idea that 
two rings have structural similarities. 

The rest of the chapter is devoted to the structure of rings of polynomi- 
als. Using the results developed in Section 5.3, we’ll see how the two main 
rings in high school mathematics — Z and polynomials in one variable with 
coefficients in a field — share many structural similarities. For example, every 
polynomial has a unique factorization as a product of primes (primes here are 
called irreducible polynomials ). And we’ll also revisit many of the theorems 
from advanced high school algebra, like the factor theorem and the fact that 
polynomials of degree n have at most n roots, putting these results in a more 
general setting. 
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Many texts say integral 
domain instead of domain. 


5.1 Domains and Fraction Fields 

We now introduce a class of commutative rings that satisfy a property enjoyed 
by our favorite rings: any product of nonzero integers is nonzero. On the other 
hand, there are commutative rings in which a product of nonzero elements is 0. 
For example, 2x3 = 0 in Zg, even though both 2 and 3 are nonzero. We now 
promote this property to a definition, for there are interesting examples (e.g., 
polynomials) where it occurs. 

Definition. A domain D is a nonzero commutative ring in which every prod- 
uct of nonzero elements is nonzero. 

A nonzero element a in a commutative ring R is called a zero divisor if 
there is a nonzero b e R with ab = 0. Using this language, we can describe a 
domain as a commutative ring without zero divisors. 

The commutative ring of integers Z is a domain, but Z m is not a domain 
when m is composite: if m = ab for 0 < a < b < m, then a ^ 0 and b ^ 0, 
but ab = m = 0. Recall the Boolean ring 2 X in Example 4.47: its elements 
are all the subsets of a set X, and its operations are symmetric difference and 
intersection. If X has at least two elements, then there are nonempty disjoint 
subsets A and B: that is, A fl B = 0. Thus, A and B are nonzero elements of 
2 X whose product AB = 0, and so 2 X is not a domain. 


How to Think About It. Everyone believes that Z is a domain — the product 
of two nonzero integers is nonzero — but a proof from first principles is surpris- 
ingly involved. If you grant that Z sits inside E, a fact that is a cornerstone of 
elementary school arithmetic using the “number line representation” of E, and 
if you grant the fact that E is a field, then there is a simple proof (see Proposi- 
tion 5.3). But that’s a fair amount of “granting.” We’ll simply assume that Z is 
a domain. 


Proposition 5.1. A nonzero commutative ring D is a domain if and only if it 
satisfies the cancellation law: Ifab = ac and a f 0, then b = c. 

Proof Assume that D is a domain. If ab = ac and a f 0, then 0 = ab—ac = 
a(b — c). Since a ^ 0, we must have b — c = 0. Hence, b = c and the 
cancellation law holds. 

Conversely, suppose that ab = 0, where both a and b are nonzero. Rewrite 
this as ab = a 0. Since a f 0 and the cancellation law holds, we have b = 0, 
a contradiction. Hence, D is a domain. ■ 

Corollary 5.2. Every field F is a domain. 

Proof. The cancellation law holds: if a e F is nonzero and ab = ac, then 
a~ x ab = a~ x ac and b = c. ■ 

Proposition 5.3. Every subring S of a field F is a domain. 

Proof. By Corollary 5.2, F is a domain. If a , b e 5 are nonzero, then their 
product (in F, and hence in S ) is also nonzero. Hence, S is a domain. ■ 
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For example, if we assume that R is a field and Z is a subring of R, then Z is 
a domain. The proof of Proposition 5.3 shows more: every subring of a domain 
is a domain. 

Fraction Fields 

The converse of Proposition 5.3 — every domain is a subring of a field — is 
much more interesting than the proposition. Just as the domain Z is a sub- 
ring of the field Q, so, too, is any domain a subring of its fraction field. We’ll 
construct such a field containing a given domain using the construction of Q 
from Z as inspiration. This is not mere generalization for generalization’s sake; 
we shall see, for example, that it will show that certain polynomial rings are 
subrings of fields of rational functions. 


How to Think About It. Warning! Over the years, school curricula have 
tried using the coming discussion to teach fractions to precollege students, 
even to fourth graders. This is a very bad idea. Experience should precede 
formalism and, in this particular case, introducing rational numbers as ordered 
pairs of integers was a pedagogical disaster. 


Elementary school teachers often say that | = f because 2 x 6 = 4 x 3. 
Sure enough, both products are 12, but isn’t this a non sequiturl Does it make 
sense? Why should cross multiplication give equality? Teachers usually con- 
tinue: suppose you have two pizzas of the same diameter, the first cut into four 
pieces of the same size, the second into six pieces of the same size; eating two 
slices of the first pizza is just as filling as eating three slices of the second. This 
makes more sense, and it tastes better, too. But wouldn’t it have been best had 
the teacher said that if a/b = c / d, then multiplying both sides by bd gives 
ad = bc\ and, conversely, if ad = be, multiplying both sides by d~ 1 b~ 1 
gives a/b = c/di 

What is j ? What is a fraction? A fraction is determined by a pair of integers- 
its numerator and denominator — and so we start with ordered pairs. Let X be 
the set of all ordered pairs (a, b ) of integers with b f 0 (informally, we are 
thinking of a/b when we write (a. b)). Define cross multiplication to be the 
relation on X 

( a,b ) = ( c,d ) if ad = be. 

This is an equivalence relation. It is reflexive: (a, b) = (a, b) because ab = 
ba. It is symmetric: if (a, b) = ( c,d ), then (c, d) = (a, b) because ad = be 
implies cb = da. We claim it is transitive: if (a,b) = (c, d) and (c, d) = 
( e , /), then (a, b ) = ( e , /). Since (a, b) = ( c , d). we have ad = be, so that 
adf = bef; similarly, (c, d) = (e, f) gives cf = de, so that bef = bde. 
Thus, 

adf = bef = bde. 

Hence, adf = bde and, canceling d (which is not 0), gives af = be; that is, 
(a,b) = (e, /). 


How to Think About It. One reason cross multiplication is important is that 
it converts many problems about fractions into problems about integers. 
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Fraction field ? Stay tuned. 


Lemma 5.4. If D is a domain and X is the set of all { a,b ) € D x D with 
/ 0, then cross multiplication is an equivalence relation on X. 

Proof The argument given above for Z is valid for D . The assumption that D 
is a domain is present so that we can use the cancellation law to prove trans- 
itivity. ■ 

Notation. If D is a domain, the equivalence class of (a. h) e X C D x D is 
denoted by 

[a,b]. 

Specialize Lemma A. 16 in Appendix A. 2 to the relation = onI: [a, b] = 
[i c,d ] if and only if {a,b) = ( c,d ); that is, [a,b] = [c, d] if and only if 
ad = be. 

Let’s finish the story in the context of arbitrary domains. 

Definition. The fraction field of a domain D is 

Frac(D) = {[a. b\ : a,b e D and b 0}. 


How to Think About It. In the back of our minds, we think of [a,b] as 
the fraction a/b. But, in everyday experience, fractions (especially rational 
numbers) are used in calculations — they can be added, multiplied, subtracted, 
and divided. The next theorem equips FracfD ) with binary operations that will 
look familiar to you if you keep thinking that [a.b] stands for a/b. 


Theorem 5.5. Let D be a domain. 

(i) Frac(D) is a field if we define 

[a,b]+ [c,d] = [ad + bc,bd] and [a,b][c, d] = [ac,bd]. 

(ii) The subset D' o/Frac(Z)), defined by 

D' = {[a, 1] : a e D}, 

is a subring o/Frac(£)). 

(iii) Every h e Frac(D) has the form uv~ 1 , where u, v & D' . 

Proof, (i) Define addition and multiplication on F = Frac (D) as in the 
statement. The symbols [ad +bc, bd] and [ac, bd\ in the definitions make 
sense, for b 0 and d 0 imply bd 0, because D is a domain. The 
proof that F is a field is now a series of routine steps. 

We show that addition F x F —> F is well-defined (i.e., single- 
valued): if [a.b] = [a' ,b'\ and [c.d] = [c',d'\, then [ad + bc.bd ] = 
[a! d' + b'c' , b'd']. Now ab' = a'b and cd' = c'd . Hence, 

{ad + bc)b' d' = adb' d' + beb'd' = (ab')dd' + bb'{cd') 

= a'bdd' + bb'c'd = ( a'd ' + b'c')bd; 

that is, {ad + be, bd) = {a'd 1 + b'c', b'd'), as desired. A similar com- 
putation shows that multiplication F x F — > F is well-defined. 
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The verification that F is a commutative ring is also routine, and it is 
left as Exercise 5.5 below, with the hints that the zero element is [0, 1], 
the identity is [1, 1], and the negative of [ a , b ] is [—a, b\. 

To see that F is a field, observe first that if [a, b] ^ 0, then o / 0 
(for the zero element of F is [0, 1] = [0, /;>]). We claim that the inverse 
of [a,b\ is [b,ci\, for [a,b\[b,a\ = [ ab.ab ] = [1, 1]. Therefore, every 
nonzero element of F has an inverse in F. 

(ii) We show that D' is a subring of F : 

[1, 1] G D' 

| (7 . 1] T- [c, 1] = | <7 + C, 1] €E D' 

[a. l][c, 1] = [ac. 1] e D' . 

(iii) If h = [a,b\, where b ^ 0, then 

h = [a,l][l,b] = [a,l][b,l]~ 1 . ■ 


Notation. From now on, we use standard notation: If D is a domain, then the 
element [ a , b] in Frac(Z) ) will be denoted by 

a/b. 

Of course, Q = Frac(Z). Not surprisingly, elementary school teachers are 
correct: it is, indeed, true that a/b = c/d if and only if ad = be. 

We started this section with two goals: to show that every domain is a sub- 
ring of a field, and to make precise the notion of “fraction.” We’ve done the 
second, but we didn’t quite show that a domain D is a subring of Frac(Z)); 
instead, we showed that D' is a subring of Frac(Z)), where D' consists of all 
[a, 1] for a e D. Now I) and D' do bear a strong resemblance to each other. 
If we identify each a in I) with [a, 1] in D' (which is reminiscent of identify- 
ing an integer m with the fraction m/l), then not only do elements correspond 
nicely, but so, too, do the operations: a + b corresponds to [a + h. 1]: 


But be careful: for arbitrary 
fraction fields, the notation 
a/b is just an alias for 
[a,b\. For <Q>, the notation 
is loaded with all kinds of 
extra meanings that don’t 
carry over to the general 
setting (for example, as a 
number having a decimal 
expansion obtained by 
dividing a by b). 


[a, 1] + [b, 1] = [a ■ 1 + 1 • b, 1 • 1] = [a + b , 1]; 


similarly, ab corresponds to [ab, 1] = [ a , l][/r, 1]. In Section 5.3, we will dis- 
cuss the important idea of isomorphism which will make our identification here 
precise. For the moment, you may regard D and D' as algebraically the same. 


Exercises 

5.1 Let R be a domain. If a E R and a 2 = a, prove that a = 0 or a = 1. Compare 
with Exercise 4.40 on page 164. 

5.2 Prove that the Gaussian integers Z[i] and the Eisenstein integers Z[a>\ are do- 
mains. 

5.3 * Prove that Z m is a domain if and only if Z m is a field. Conclude, using Theo- 
rem 4.43, that Z m is a domain if and only if m is prime. 

5.4 Prove that every finite domain D (i.e., \ D\ < oo) is a field. 

Hint. Use the Pigeonhole Principle, Exercise A. 11 on page 419. 
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5.5 * Complete the proof of Theorem 5.5. 

5.6 Let Q(t) = {r + si : r, s E Q} be the set of complex numbers whose real and 

imaginary parts are rational. 

(i) Show that Q(i ) is a field. 

(ii) True or false? Frac (Z[/]) = {[;• + si, 1] : r + si e Q(/)} . 

5.7 * 

(i) Show that Q(w) = {r + sa> : r, s e Q} is a field, where a> = e 2lli ' 3 is a 
cube root of unity. 

(ii) True or false? Frac (Z[a>]) = {[r + sod, 1] : r + sw e Q(w)} . Why? 

5.2 Polynomials 

You are surely familiar with polynomials; since they can be added and mul- 
tiplied, it is not surprising that they form commutative rings. However, there 
are some basic questions about them whose answers may be less familiar. Is a 
polynomial a function? Is x a variable? If not, just what is xl After all, we first 
encounter polynomials as real-valued functions having simple formulas; for 
example, / (x) = x 3 — 2x 2 + 7 is viewed as the function / # : R — > R defined 
by f # (a) = a 3 —2a 2 + l for every a e R. But some polynomials have complex 
coefficients. Is it legitimate to consider polynomials whose coefficients lie in 
any commutative ring R2 When are two polynomials equal? Every high school 
algebra student would say that the functions defined by f (x) = x 7 + 2x — 1 
and g(x ) = 3x + 6 are not the same, because they are defined by different 
polynomials. But these two functions are, in fact, equal when viewed as func- 
tions Zj — > Z 7 , a fact that you can check by direct calculation. Here’s another 
example. Is it legitimate to treat 2x + 1 as a polynomial whose coefficients lie 
in Z 4 ? If so, then (2x + l ) 2 = 4x 2 + 4x + 1 = 1 (for 4 = 0 in Z 4 ); that 
is, the square of this linear polynomial is a constant! Sometimes polynomials 
are treated as formal expressions in which x is just a symbol, as, for example, 
when you factor x 6 — 1 or expand (x + l) 5 . And sometimes polynomials are 
treated as functions that can be graphed or composed. Both of these perspec- 
tives are important and useful, but they are clearly different. 

We now introduce polynomials rigorously, for this will enable us to answer 
these questions. In this section, we’ll first study polynomials from the formal 
viewpoint, after which we’ll consider polynomial functions. In the next sec- 
tion, we will see that the notion of homomorphism will link the formal and the 
function viewpoints, revealing their intimate connection. 


How to Think About It. As we said on page 193 in the context of fractions, 
rigorous developments should not be points of entry. One goal of this section is 
to put polynomials on a firm footing. This will prepare you for any future work 
you do with beginning algebra students, but it is in no way meant to take the 
place of all of the informal experience that’s necessary before the formalities 
can be appreciated and understood. 


We investigate polynomials in a very formal way, beginning with the allied 
notion of power series. A key observation is that one should pay attention to 
where the coefficients of polynomials live. 
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Definition. If R is a commutative ring, then a formal power series over R is a 
sequence 


o = {so,si,S2,...,Si, ...); 
the entries s, e R are called the coefficients of a. 


Be patient. The reason for this terminology will be apparent in a few pages. 
In the meantime, pretend that (To. s\,si, . ■ ■ ,s\, . . . ) is really So + s \x + 

S 2 X 2 + h SjX' + ■■■ . 

A formal power series a over R is a sequence, but a sequence is just a 
function a: N — > R (where N is the set of natural numbers) with a (if = Si 
for all i > 0. By Proposition A. 2 in Appendix A.l, two sequences a and r are 
equal if and only if a (if = r (/) for all / e N. So, formal power series are 
equal if and only if they are equal “coefficient by coefficient.” 

Proposition 5.6. Formal power series a = (so, s\, S 2 , . . ■ , Si, . . . ) and r = 
(to > h > ti, • ■ ■ , U , . . . ) over a commutative ring R are equal if and only ifsj = tj 
for all i > 0. 


In linear algebra, you may 
have seen the example of 
the vector space V of all 
polynomials of degree, say, 
3 or less, with coefficients 
in R. As a vector space, 

V can be thought of as 
R 4 , where the 4-tuple 
(5, 6, 8, 9) corresponds to 
the polynomial 5 + 6x + 
8x 2 + 9x 3 . 


How to Think About It. Discussions of power series in calculus usually 
involve questions asking about those values of x for which 5o + Si* + S 2 X 2 + 

• • • converges. In most commutative rings, however, limits are not defined, 
and so, in general, convergence of formal power series does not even make 
sense. Now the definition of formal power series is not very complicated, while 
limits are a genuinely new and subtle idea (it took mathematicians around 200 
years to agree on a proper definition). Since power series are usually introduced 
at the same time as limits, however, most calculus students (and ex-calculus 
students!) are not comfortable with them; the simple notion of power series is 
entangled with the sophisticated notion of limit. 

Today’s calculus classes do not follow the historical development. Calcu- 
lus was invented to answer a practical need; in fact, the word calculus arose 
because it described a branch of mathematics involving or leading to calcu- 
lations. In the 1600s, navigation on the high seas was a matter of life and 
death, and practical tools were necessary for the safety of boats crossing the 
oceans. One such tool was calculus, which is needed in astronomical calcula- 
tions. Newton realized that his definition of integral was complicated; telling 
a navigator that the integral of a function is some fancy limit of approxima- 
tions and fluxions would be foolish. To make calculus useful, he introduced 
power series (Newton discovered the usual power series for sinx and cosx), 
he assumed that most integrands occurring in applications have a power series 
expansion, and he further assumed that term-by-term integration was valid for 
them. Thus, power series were actually introduced as “long polynomials” in 
order to simplify using calculus in applications. 


Polynomials are special power series. 

Definition. A polynomial over a commutative ring R is a formal power series 
a = (so , s 1 , . . . , Si , . . . ) over R for which there exists some integer n > 0 with 
Si = 0 for all i > n ; that is, 


o = (so,si,...,s n ,0,0 ). 
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The zero polynomial , denoted by a = 0, is the sequence a = (0, 0, 0, . . . ). 

A polynomial has only finitely many nonzero coefficients; that is, it is a 
“short power series.” 


Some authors define 
the degree of the zero 
polynomial 0 to be — oo, 
where — oo + n = — oo for 
every integer /; € N (this 
is sometimes convenient). 
We choose not to assign 
a degree to 0 because, 
in proofs, it often must 
be treated differently than 
other polynomials. 


Definition. If a = (,vo. ,?i , s n . 0. 0. . . . ) is a nonzero polynomial, then 

there is n > 0 with s n ^ 0 and s t = 0 for all i > n. We call s n the leading 
coefficient of a, we call n the degree of a , and we denote the degree by n = 
deg (c). 

The zero polynomial 0 does not have a degree because it has no nonzero 
coefficients. 


Etymology. The word degree comes from the Latin word meaning “step.” 

Each term SjX 1 (in the usual notation sq + six + S 2 X 2 + \- s/x' + ■ ■ •) has 

degree i , and so the degrees suggest a staircase. 

The word coefficient means “acting together to some single end.” Here, co- 
efficients collectively give one formal power series or one polynomial. 


Notation. If R is a commutative ring, then 

*[[*]] 

denotes the set of all formal power series over R , and 

R[x] C R[[x]] 

denotes the set of all polynomials over R. 

We want to make /?[[*]] into a commutative ring, and so we define addition 
and multiplication of formal power series. Suppose that 

<7 = (5 0 ,Ji, • • • ,Si , . . .) and r = (t 0 , h , . . , f U , . . . ). 

Define their sum by adding term by term: 

o + x = (^o + to, H + t \, . . . , Si + ti, . . .). 

What about multiplication? The product of two power series is also computed 
term by term; multiply formally and collect like powers of x: 

(s 0 + SlX + S2X 2 + • • • + s,x' + • • • )(t 0 + t\X + t2X 2 + • • • + tjX j + • • •) 

= So(to + t\X + t 2 X 2 + •••) + S\X(to + t\X + t 2 X 2 + •••) + ••• 

= (^ 0^0 + Sot\X + Sot2X 2 + •••) + (S\toX + S\t\X 2 + S\t2X 3 + •••) + ••• 

= Soto + (■SHo + Sotlfx + (sotl + ■SlH + S2to)x 2 + — • 

Motivated by this, we define multiplication of formal power series by 
ox = (s 0 to, soh + sit 0 , s 0 t 2 + s\t\ + s 2 to , ■■■)', 
more precisely, 

ctt = (c 0 ,ci,...,cjt, ...), 

where c k = J2i+j=k s < l J = HiLo^k-i- 
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Proposition 5.7. If R is a commutative ring , then R [[a]], together with the 
operations of addition and multiplication defined above, is a commutative ring. 

Proof. Addition and multiplication are operations on R [[x]] : the sum and prod- 
uct of two formal power series are also formal power series. Define zero to be 
the zero polynomial, define the identity to be the polynomial (1, 0, 0, ... ), and 
define the negative of (so, Ji, . . . , s ,, . . . ) to be (— so, —s i, . . . , —s,, . . .). Veri- 
fications of the axioms of a commutative ring are routine, and we leave them as 
Exercise 5.8 on page 202. The only difficulty that might arise is proving the as- 
sociativity of multiplication. Hint: if p = (i'q, r \ , . . . , r, , . . . ), then the £th co- 
ordinate of the polynomial p(cr r) turns out to be £A+/-Hfc=f r i( s jtk )> while the 
fth coordinate of the power series (per) r turns out to be +k=e.{ r iSj)tk\ 
these are equal because associativity of multiplication in R gives tj(sjtk ) = 
(>'j Sj)tk for all i, j, k. ■ 

We’ll see in a moment that the subset of polynomials is a subring of 
the commutative ring of formal power series f?[[x]]. 

Lemma 5.8. Let R be a commutative ring and a, re R[x] be nonzero poly- 
nomials. 

(i) Either or = Oordeg(crr) < deg(cr) + deg(r). 

(ii) If R is a domain, then or ^ 0 and 

deg(o-r) = deg(cr) + deg(r). 

Proof. Let o = (so, s i, . . . ) have degree m, let r = (to - 1 \, . . .) have degree 
n, and let a r = (co, c i, . . . ). 

(i) It suffices to prove that c/t = 0 for all k > m + n. By definition. 


Ck — SO^k “I” **■ “I” S m tk—m T S m + ltk— m — 1 “I” ' ' ' Skt 0- 

All terms to the right of s m tk- m are 0, because deg(cr) = m, and sos; = 0 
for all i > m + 1 . Now s m tk- m , as well as all the terms to its left, are 0, 
because deg(r) = n, and so tj = 0 for all j > k — m > n. 

(ii) We claim that c m + n = s m t n , the product of the leading coefficients of o 
and r. Now 

Cm+n — ^ ^ $i tj 

i + j=m+n 

= Sot m -\- n + 1- s m ~ + s m t n + s m -\-\t n — i + . 

We show that every term Sj tj in c m + n , other than s m t n , is 0. If i < m, 
then m — i > 0; hence, j = m — i + n > n, and so tj = 0; that is, each 
term to the left of s m t n is 0. If i > m, then Si = 0, and each term to the 
right of s m t n is 0. Therefore, 


Cm+n 


Sm^rt • 


If R is a domain, then s m f 0 and t n f 0 imply s m t n f 0; hence, 
c m +n = s m t n ^ 0, or ^ 0, and deg(oT) = m + n. ■ 
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Exercise 5.22 on page 
203 shows that if R is a 
domain, then K[[jc]] is a 
domain. 


See Exercise 5.9 on 
page 202. 


Corollary 5.9. (i) If R is a commutative ring , then i?[x] and R are subrings 

ofR[[x]\. 

(ii) If R is a domain, then R [a] is a domain. 

Proof, (i) Let a, re R [x] . Now a + r is a polynomial, for either a + r = 0 
or deg((j + r) < max{deg(a), deg(r)}. By Lemma 5.8(i), the product 
of two polynomials is also a polynomial. Finally, 1 = (1, 0, 0, . . .) is a 
polynomial, and so f?[x] is a subring of f?[[x]]. 

It is easy to check that R' = {(r, 0, 0, ... ) : r G R} is a subring of 
f?[x], and we may view R' as R by identifying r G R with (r, 0, 0, . . . ). 

(ii) If o and r are nonzero polynomials, then Lemma 5.8(ii) shows that a r 
0. Therefore, [_v] is a domain. ■ 

We remark that R can’t be a subring of /?[.v] or of f?[[x]] because it’s not 
even a subset of these rings. This is why we have introduced the subring R'. 
A similar thing happened when we couldn’t view a domain D as a subring 
of its fraction field Frac (£)). We shall return to this point when we discuss 
isomorphisms. 

From now on, we view /?[x] and /?[[x]] as rings, not merely as sets. 

Definition. If R is a commutative ring, then R [x] is called the ring of polyno- 
mials over R, and f?[[x]] is called the ring of formal power series over R. 


Here is the link between this discussion and the usual notation. 


Definition. The indeterminate x is the element 


x = (0, 1,0,0,...) G R[x]. 


How to Think About It. Thus, x is neither “the unknown’’ nor a variable; 
it is a specific element in the commutative ring /?[x], namely, the polynomial 
(ao, a i , U 2 , ■ ■ ■ ) with a\ = 1 and all other a, = 0; it is a polynomial of de- 
gree 1. 

Note that we need the unit 1 in a commutative ring R in order to define the 
indeterminate in f?[x]. 


Lemma 5.10. Let R be a commutative ring. 

(i) If a = ( So , s\, . . . ,Sj, . . .) G /?[[x]], then 

xo = (0, So, si , . . . , Sj , . . . ); 

that is, multiplying by x shifts each coefficient one step to the right. 

(ii) If n > 0, then x n is the polynomial having 0 everywhere except for 1 in 
the nth coordinate. 

(iii) If r G R and (so, s\, . . . , Sj , . . .) G f?[[x]], then 


(r, 0, 0 )(.y 0 , si, . . . , Sj, . . . ) = (rs 0 , rs i, . . . , rsj, ...). 
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Proof, (i) Write x = (ao,ai , . . . , at , . . . ), where a\ = 1 and all other a,- = 
0, and let xa = (co, Ci, . . . , C£, . . . ). Now cq = OqSq = 0, because cio = 
0. If k > 1, then the only nonzero term in the sum c/ c = Y2i+/=k a < s j 
is ci\Sk-i = Sk- 1 , because ci, = 0 for i f \\ thus, for k > 1, the k th 
coordinate c ^ of xa is jjt-i. and X(T = (0, So, 5i, . . . , J/ , . . . ). 

(ii) Use induction and part (i). 

(iii) This follows from the definition of multiplication. ■ 

If we identify (r, 0, 0, ... ) with r, as in the proof of Corollary 5.9, then 
Lemma 5. 10(iii) reads 

r(s 0 ,s i, = (rs 0 , rs\, ..., rs t , 

We can now recapture the usual polynomial notation. 

Proposition 5.11. Let R be a commutative ring. If a = (so, 5i»*. . , s n , 0, 0, . . .) 
/?[*] has degree n, then 

CT = ,v 0 + Six + Six 1 H h s n x n , 

where each element s € R is identified with the polynomial (5,0,0,...). More- 
over, if x = to + t\x + tix 2 + • • • + t m x m , then a = r if and only ifn = m 
and Sj = ti for all i > 0. 

Proof. 

o = {sq, s i s„. 0. 0 ) 

= (^Oi 0, 0, . . . ) + (0, 5j , 0, ...) + ••• + (0, 0, . . . , 0, s n , 0, . . . ) 

= 5 0 (1, 0, 0, . . .) + 5i(0, 1, o, . . .) + • • • + 5„(0, 0 0, 1, 0, . . .) 

= 50 + S\X + Six 2 -\ 1- S„X n . 

The second statement merely rephrases Proposition 5.6, equality of polyno- 
mials, in terms of the usual notation. ■ 

We shall use this familiar (and standard) notation from now on. As is cus- 
tomary, we shall write 

/ (x) = So + S\X + six 2 -I h s n x n 

instead of a = (so. Si , . . . , s n , 0, 0, . . .). 

Corollary 5.12. If R is a commutative ring, then the polynomial ring R [a ] is 
infinite. 

Proof. By Proposition 5.11, x 1 ^ x J if i^fij. ■ 

If / (x) = so + S\x + Six 2 -\ 1- s„x n , where s n f=- 0, then .Vo is called its 

constant term and, as we have already said, s n is called its leading coefficient. 
If its leading coefficient s n = 1, then / (x) is called monic. Every polynomial 
other than the zero polynomial 0 (having all coefficients 0) has a degree. A 
constant polynomial is either the zero polynomial or a polynomial of degree 
0. Polynomials of degree 1, namely a + bx with b f 0, are called linear , 
polynomials of degree 2 are quadratic , degree 3s are cubic, then quartics, 
quintics, and so on. 


See Exercise 5.11 on 
page 202. 


So, two polynomials are 
equal in £[*] if and only 
if they are equal “term by 
term.” 
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Etymology. Quadratic polynomials are so called because the particular qua- 
dratic x 2 gives the area of a square ( quadratic comes from the Latin word 
meaning four, which reminds us of the 4-sided figure); similarly, cubic polyno- 
mials are so called because x 3 gives the volume of a cube. Linear polynomials 
are so called because the graph of a linear polynomial ax + b in R[x] is a line. 


Vector spaces over arbi- 
trary fields are discussed 
in Appendix A. 3. 


Exercises 

5.8 * Fill in the details and complete the proof of Proposition 5.7. 

5.9 * Suppose that R is a commutative ring. In the proof of Corollary 5.9(i), we de- 
fined R' as the set of all power series of the form (r, 0, 0, 0, ... ) where r £ R, and 
we said “we may view R' as R by identifying r £ R with (r, 0,0,... ).” Show, if 
r.sEl, that 

(i) r + s is identified with (r, 0, 0, 0, ...) + («, 0, 0, 0, ... ) 

(ii) rs is identified with (r, 0, 0, 0, . . . )Q, 0, 0, 0, . . . ). 

5.10 If (to, t \ , t 2 , - • • ) is a power series over R and r £ R, show that 

(r, 0, 0, 0 , . . . )Qo- t \ , t 2 , ■ ■ ■ ) = (rfo, rt \ , rt 2 , - ■ - ). 

5.11 * Suppose that F is a field. Show that F[[x]] is a vector space over F where 
addition is defined as addition of power series and scalar multiplication is defined 
by 

rQo.-si.-S2 ,---) = (rso,rsi,rs 2 , ■ ■ ■)■ 

5.12 If R is the zero ring, what are R [x] and ,R[[x]]? Why? 

5.13 Prove that if R is a commutative ring, then R [x] is never a field. 

Hint. If x _1 exists, what is its degree? 

5.14 (i) Let R be a domain. Prove that if a polynomial in R [x] is a unit, then it is a 

nonzero constant (the converse is true if R is a field). 

Hint. Compute degrees. 

(ii) Show that (2x + l) 2 = 1 in Z4Q]. Conclude that 2x + 1 is a unit in Z4Q], 
and that the hypothesis in part (i) that R be a domain is necessary. 

5.15 * If R is a commutative ring and 

/(x) = so + six 4- s 2 x 2 + • • • + s n x n £ 7?[x] 

has degree n > 1, define its formal derivative f'(x) £ R[x\ by 

f'(x) = si 4- 2s 2 x + 3 s 3X 2 + • • • + ns n x n ~ x ; 

if / is a constant polynomial, define its derivative to be the zero polynomial. 
Prove that the usual rules of calculus hold for derivatives in R [x] : 

(7 + gY = f + g’ 

(rfY = r(f') if reR 

(fgY = fg' + f'g 

[1 pt](f n y = nf n ~ l f for all n > 1 . 
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5.16 Take It Further. Define / : Q[x] — >■ Q[x] by 

J f — a 0 X I” 2^1"^ + * * * + n-\ I ^ n ^ ^ Q|[x], 

where /(x) = flo + aix + • • • + ci n x n 6 Q[x]. 

(i) Prove that /(/ + g) = f f + f g. 

(ii) If D is the derivative, prove that D f = 1q[ x ], but that f D ^ 1q[x]- 

5.17 *Preview. Let R be a commutative ring, let f(x) e R[x\, and let f'(x) be its 
derivative. 

(i) Prove that if (x — a) 2 is a divisor of / in U[x], then x — a is a divisor of f 
in R[x]. 

(ii) Prove that if x — a is a divisor of both / and f, then (x — a) 2 is a divisor 
of/. 

5.18 (i) If f(x) = ax 2p + bx p + c E Z p [x\, prove that fix) = 0. 

(ii) Prove that a polynomial / (x) e Z p [x] has f'(x) = 0 if and only if there is a 
polynomial g(x) = a n x n with f(x) = g(x p )', thatis, f(x) = a n x np . 

5.19 If p is a prime, show, in Z p \x\, that 

(x + ly 7 = x p + 1. 


denotes the identity 
function on the set Q[x]. 
Why didn’t we define 
f: R[x] ->■ U[x] for any 
commutative ring R? 


5.20 * 

(i) If R is a domain and a = 1 + x + x 2 + • • • + x n + • • • e K[[.v]], prove that 
n is a unit in K[[.v]]; in fact, (1 — x)a = 1. 

(ii) Show that (1 — x) 2 is a unit in Q[[.v]], and express 1/(1 — x) 2 as a power 
series. 

Hint. See Exercise 5.22 below. 

5.21 Show that 1 — x — x 2 is a unit in Q[[jt]], and express 1/(1 — x — x 2 ) as a power 
series. 

5.22 * 

(i) Prove that if R is a domain, then K[[.v]] is a domain. 

Hint. If a = (xo. Ji , . . . ) e f?[[x]] is nonzero, define the order of a, denoted 
by ord(rr), to be the smallest n > 0 for which s n 0. If R is a domain and 
a, r 6 .R[[.v]] are nonzero, prove that ord(ar) > ord(cr) + ord(r), and use 
this to conclude that a r / 0. 

(ii) Let k be a field. Prove that a formal power series a e k\[x\\ is a unit if and 
only if its constant term is nonzero; that is, ord(a) = 0. 

(iii) Prove that if a 6 L[[.v]] and ord(cr) = n, then a = x n u, where u is a unit 
in L[[.v]]. 

5.23 * 

(i) Prove that Frac(Z[.v]) = Q(x). 

(ii) Let D be a domain with K = Frac(D). Prove that Frac(D[.v]) = K(x). 

5.24 (i) Expand (C 2 + S 2 — 1)(S 2 + 2 CS — C 2 ), where C and S are elements in 

some commutative ring. 

(ii) Establish the trigonometric identity 

2 3 • • 3-4 

cos x + 2 cos x sin.v + 2 cosx sin x + sin x = 

cos 4 .y + 2 cosx sinx + sin 2 x. 
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5.25 Preview. Suppose p is a prime and 


fp(x) = 


x P - 1 


x — 1 


(i) Show that f p (x) = x p 1 + x p 2 + 1-1. 

(ii) Show that f p (x + 1) = x p in Z p [x], 


Polynomial Functions 

Let’s now pass to viewing polynomials as functions. Each polynomial / ( x ) = 

so + s\x + S 2 X 2 -\ h s n x n € A[x] defines its associated polynomial function 

f # :R^-R by evaluation : 

f#(a) = “I - S\Q + S 2 C 1 2 T * * * ~h s n a n g R , 


In Proposition 6.18, we will 
see that there’s a bijection 
between polynomials and 
their associated polynomial 
functions if R is an infinite 
field. 


where a G R (in this way, we can view the indeterminate x as a variable). 
But polynomials and polynomial functions are different things. For example, 
Corollary 5.12 says, for every commutative ring A, that there are infinitely 
many polynomials in A 1 [a]. On the other hand, if R is finite (e.g., R = l m ), 
then there are only finitely many functions from R to itself, and so there are 
only finitely many polynomial functions. Fermat’s Theorem (a p = a mod p 
for every prime p) gives a concrete example of distinct polynomials defining 
the same polynomial function; f (x) = x p — x is a nonzero polynomial, yet its 
associated polynomial function f # : 7L p — > 7L p is the constant function zero. 

Recall Example 4.31: if R is a commutative ring, then Fun (A) = R r , 
the set of all functions from R to itself, is a commutative ring under pointwise 
operations. We have seen that every polynomial f (x) G A [a] has an associated 
polynomial function / # e Fun(A), and we claim that 

Poly(A) = {/ # : f(x) G A [a]} 


is a subring of Fun(A) (we admit that we are being very pedantic, but you 
will see in the next section that there’s a good reason for this fussiness). The 
identity u of R r is the constant function with value 1, where 1 is the identity 
element of A; that is, u = 1 # , where 1 is the constant polynomial. We claim 
that if / (a), g( x) e A[a], then 

f # + g # =(f + g) # and f # g # = (fgf. 

(In the equation / # + g # = (/ + g) # , the plus sign on the left means addition 
of functions, while the plus sign on the right means the usual addition of poly- 
nomials in A [a] ; a similar remark holds for multiplication.) The proof of these 
equations is left as Exercise 5.27 on page 206. 


Etymology. In spite of the difference between polynomials and polynomial 
functions, A [a] is often called the ring of all polynomials in one variable 
over A. 


Since k[x\ is a domain when k is a field, by Corollary 5.9(ii), it has a frac- 
tion field. 
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Definition. If k is a field, then the fraction field Frac(/:[x]) of k[x\, denoted 
by 

k(x), 

is called the field of rational functions over k. 


How to Think About It. By convention, the elements of k(x) are called 
rational “functions” but they are simply elements of the fraction field for k[x]. 
Of course, a rational function can be viewed as an actual function via evalu- 
ation at elements of k, in the same way that a polynomial in k[x] gives rise 
to its associated polynomial function defined on k. But the domain of such a 
rational function may not be all of k (why?). 

We’ll use the standard notation for elements in fraction fields (introduced 
on page 195) for rational functions over a field: [/, g] will be denoted by f/g. 


Proposition 5.13. If p is prime, then the field of rational functions Z p (x ) is 
an infinite field containing Z p as a subfield. 

Proof. By Corollary 5.12, Z p [x\ is infinite, because the powers x n , for n e N, 
are distinct. Thus, its fraction field, Z p (x), is an infinite field containing Z p [x\ 
as a subring. But Z p [x] contains Z p as a subring, by Corollary 5.9. ■ 

Notation. We’ve been using Z p to stand for the integers mod p, and we know 
that it is a field. There are other finite fields that do not have a prime number 
of elements; you met one, F 4 , in Exercise 4.55 on page 165. It’s customary to 
denote a field with q elements by 

F,. 

It turns out that q = p n for some prime p and some n > 1. Moreover, there 
exists essentially only one field with q elements, a fact we’ll prove in Chap- 
ter 7. In particular, there is only one field with exactly p elements and so, from 
now on, we’ll use the notations 

Z p = Fp 

interchangeably (we’ll use when we’re viewing it as a field). 

Let’s now consider polynomials over R in two variables x and y. A quadratic 
polynomial ax 2 + bxy + cy 2 + dx + ey + f can be rewritten as 

ax 2 + {by + d)x + (cy 2 + ey + /); 

that is, it is a polynomial in x with coefficients in /?[}']. If we write A = 7?[y], 
then it is clear that A[x] is a commutative ring. 

Definition. If R is a commutative ring, then i?[x, y] = ,1 [x], where A = /?[\], 
is the ring of all polynomials over R in two variables. 

By induction, we can form the commutative ring R[x\, Xi , .... x n \ of all 
polynomials in n variables over R : 

R[x i,x 2 , . . . ,x„+i] = (R[x i,x 2 x n ])[x n+1 \. 


We can define ,R(x) for 
arbitrary domains R. See 
Exercise 5.23 on page 203. 


Well, Z p(x) contains a 
domain with a “strong 
resemblance” to Z p [x] 
(see page 195). Weil make 
this precise in Section 5.3. 
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Corollary 5.9 can now be generalized, by induction on n, to say that if D 

is a domain, then so is D[x\, X2, . . . , x n \; we call Frac(Z)[xi, X2 x n \) the 

ring of rational functions in n variables. Exercise 5.23 on page 203 can be 
generalized to several variables: if K = Frac (D), then 

Frac(D[xi,x 2 , . . . ,x„]) = K(xi,x 2 , ■ ■ . ,x„); 
its elements have the form f/g, where f, g e K[x\ , x 2 , . . . , x„] and g / 0. 


Exercises 

5.26 Let R be a commutative ring. Show that if two polynomials /(x),g(x) e i?[x] 
are equal, then their associated polynomial functions are equal; that is, / # = g # . 

5.27 * If R is a commutative ring, prove that Poly( R) is a subring of Fun( R) = R r . 

5.28 True or false, with reasons: 

(i) (x 2 — 9)/(x 2 — 2x — 3) = (x + 3)/(x + 1) in Q(x). 

(ii) What are the domains of the functions x (x 2 — 9)/(x 2 — 2x — 3) and 
x (x + 3)/(x + 1)? Are the functions equal? 

5.3 Homomorphisms 

The question whether two given commutative rings R and S are somehow the 
same has already arisen, at least twice. 

(i) On page 195 we said 

For the moment, you may regard D and D' as algebraically the 
same. 

(ii) And on page 201 we said 

If we identify (r, 0, 0, ... ) with r, then Lemma 5. 10(iii) reads 
r(s 0 , Si, ... ,Si, .. .) = (rs 0 , rsi, . . . , rs,- ). 

What does “the same” mean in statement (i)? What does “identify” mean in 
statement (ii)? More important, if R is a commutative ring, we wish to compare 
the (formal) polynomial ring [x] with the ring Poly( R) of all polynomial 
functions on R. 

We begin our discussion by considering the ring Z 2 ; it has two elements, the 
congruence classes 0, 1, and the following addition and multiplication tables. 


+ 

□ 

ni 

0 



1 

D 

□i 



□ 

n 




D 

D 

n 


The two words even, odd also form a commutative ring, call it V; its addi- 
tion and multiplication are pictured in the following tables. 


X 

even 

odd 

even 

even 

even 

odd 

even 

odd 


+ 

even 

odd 

even 

even 

odd 

odd 

odd 

even 
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Thus, odd + odd = even and odd x odd = odd. It is clear that the commutative 
rings Z2 and V are distinct; on the other hand, it is equally clear that there 
is no significant difference between them. The elements of Z2 are given in 
terms of numbers; those of V in terms of words. We may think of V as a trans- 
lation of Z2 into another language. And more than just a correspondence of 
elements, the operations of addition and multiplication (that is, the two tables) 
get translated, too. 

A reasonable way to compare two systems is to set up a function between 
them that preserves certain essential structural properties (we hinted at this 
idea earlier when we noted that a ring R is essentially a subring of /?[*]). 
The notions of homomorphism and isomorphism will make this intuitive idea 
precise. Here are the definitions; we will discuss what they mean afterward. 

Definition. Let R and S be commutative rings. A homomorphism is a func- 
tion (p\ R — > S such that, for all a,b e R, 

(i) <p(a + b) = <p{a ) + <p(b), 

(ii) cp(ab) = <p(d)<p(b), 

(iii) tp(l ) = 1 (the 1 on the left-hand side is the identity of R; the 1 on the 
right-hand side is the identity of S). 

If <p is also a bijection, then <p is called an isomorphism. Two commutative 
rings R and S are called isomorphic , denoted by R = S, if there exists an 
isomorphism (p\ R — > S between them. 

In the definition of a homomorphism (p\ R — > S, the + on the left-hand side 
is addition in R , while the + on the right-hand side is addition in S; similarly 
for products. A more complete notation for a commutative ring would display 
its addition, multiplication, and unit: instead of R , we could write (R . +. x,l). 
Similarly, a more complete notation for S is (.S'. ®, ®, e). The definition of 
homomorphism can now be stated more precisely 

(i) cp(a + b) = (p(a) © < p{b), 

(ii) (p(a x b) = < p(a) ® (p(b ), 

(iii) <p{ 1 ) = e. 


Etymology. The word homomorphism comes from the Greek homo, mean- 
ing “same,” and morph, meaning “shape” or “form.” Thus, a homomorphism 
carries a commutative ring to another commutative ring of similar form. The 
word isomorphism involves the Greek iso, meaning “equal,” and isomorphic 
rings have identical form. 


Consider the two simple examples above of addition tables arising from 
the rings Z2 and V (the symbol V stands for “parity.”). The rings Z2 and V 


0 

1 

~r 

— 


even 

odd 

odd 

even 


Figure 5.1 . Addition tables. 
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Does a homomorphism 
“preserve 0”? See 
Lemma 5.17. 


are isomorphic, for the function <p: Z 2 — > V, defined by cp( 0) = even and 
ip( 1) = odd, is an isomorphism, as the reader can quickly check (of course, 
you must also check the multiplication tables). 

Let a \ , a.2 , . . . , ay, . . . be a list with no repetitions of all the elements of a 
ring R. An addition table for R is a matrix whose i j entry is a, + ay . 


+ 

a\ 

dj 

a\ 

d\ H - d\ 

a 1 4 a y 

at 

di + d\ 

Clj + Clj 


A multiplication table for R is defined similarly. 

The addition and multiplication tables for a ring R depend on the listing of 
its elements, so that a ring has many tables. Let ai, 02 , . . . , ay , . . . be a list of 
all the elements of a ring R with no repetitions. If S is a ring and (p: R — > S is 

a bijection, then <p(a 1 ), (p{ci 2 ) < p(ay), ... is a list of all the elements of S 

with no repetitions, and so this latter list determines addition and multiplication 
tables for S . That <p is an isomorphism says that if we superimpose the tables 

for R (determined by ai , CI 2 ay , . . . ) upon the tables for S (determined by 

cp(a 1 ), (p(ci 2 ), . . . , 1 p(cij ), . . . ), then the tables match. In more detail, if a, + ay 
is the i j entry in the given addition table of R, then (fi(aj) + ip(cij ) is the i j 
entry of the addition table of S . But (p(cij) + ip{cij ) = ip(a\ + ay), because <p is 
an isomorphism. In this sense, isomorphic rings have the same addition tables 
and the same multiplication tables (see Figure 5.1). Informally, we say that 
a homomorphism preserves addition, multiplication, and 1 . Thus, isomorphic 
rings are essentially the same, differing only in the notation for their elements 
and their operations. 

Here are two interesting examples of homomorphisms: the first will be used 
often in this book; the second compares the two different ways we view poly- 
nomials. 

Example 5.14. (i) Reduction mod m. 

We didn’t have the language to say it at the time, but Proposition 4.5 
sets up a homomorphism r m : Z — > Z m for any nonnegative integer m , 
namely, r m :n 1 — > [«]. It’s not an isomorphism because Z is infinite and 
Z m is finite (so there can’t be any bijection between them). Another rea- 
son is that r m (in) = 0 = r m (2m), sor m can’t be injective. Is it surjective? 

(ii) Form to Function. 

In Example 4.31, we saw that every f(x) e /?[*], where R is a com- 
mutative ring, determines its associated polynomial function / # : R — > R. 
The function <p: f ho- f* is a homomorphism R[jc] -» Fun( R) = R r : 
as we saw on page 204, addition of polynomials corresponds to pointwise 
addition of polynomial functions (/ + g) # = f # + g # , and multiplica- 
tion of polynomials corresponds to pointwise multiplication of polyno- 
mial functions ( fg) # = f # g # . Is <p an isomorphism? No, because it’s 
not surjective — not every function on R is a polynomial function. Is <p 
an isomorphism between R[.r] and the subring i m (p = Polyf R) of R r 
consisting of all polynomial functions? That depends on R. We’ve seen. 
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for example, that f(x ) = x 1 + 2x — 1 and g(x ) = 3x + 6 give the 
same function (/ # = g # ) when R = Zj. But we’ll see, in Theorem 6.20, 
that if R is an infinite field (as it almost always is in high school), then 
<p: R[x] — > Poly(7?) is an isomorphism. ▲ 

As with all important definitions in mathematics, the idea of homomor- 
phism existed long before the name. Here are some examples that you’ve en- 
countered so far. 

Example 5.15. (i) Complex conjugation z = a + ib z = a — ib is a 

homomorphism C — > C, because 1 = 1, z + w = z + u7, and zw; = zuT; 
it is a bijection because z = z (so that it is its own inverse) and, therefore, 
complex conjugation is an isomorphism. 

(ii) Let D be a domain with fraction field F = Frac( D ). In Theorem 5.5, we 
proved that D' = {[a, 1] : a G D} is a subring of F . We can now identify 
D' with D , for the function <p\ D —> D r , given by cp(a) = [ a , 1] = a/1, 
is an isomorphism. 

(iii) In the proof of Corollary 5.9, we “identified” an element r in a com- 
mutative ring R with the constant polynomial (r, 0, 0, . . . ). We said that 
R is a subring of A! [a], but that is not the truth. The function <p: R —>■ 
R[x], defined by (p{r ) = ( r , 0, 0, . . .), is a homomorphism, and R ' = 
{(r, 0, 0, . . .) : r e R} is a subring of f?[x] isomorphic to R , and <p: R — > 
R ' is an isomorphism. 

(iv) If S is a subring of a commutative ring R, then the inclusion i : S —*■ R is 
a ring homomorphism (this is one reason why we insist that the identity 
of R lie in S). 

(v) Recall Example 4.47 : if X is a set, then 2 X is the Boolean ring of all the 
subsets of X, where addition is symmetric difference, multiplication is 
intersection, and the identity is X . If Y is a proper subset of X , then 2 Y is 
not a subring of 2 X , for the identity of 2 Y is Y , not X. Thus, the inclusion 
i: 2 y — ► 2 X is not a homomorphism, even though / (a + b) = i (a) + i(b) 
and i ( ab ) = i ( a)i ( b ). Therefore, the part of the definition of homomor- 
phism requiring identity elements be preserved is not redundant. ▲ 

Example 5.16. Example 4.31 on page 157 shows, for a commutative ring R 
and a set A, that the family R x of all functions from A to R, equipped with 
pointwise addition and multiplication, is a commutative ring. We’ve also used 
the notation 2 X in Example 4.47 on page 166 to stand for the Boolean ring of 
all subsets of a set X . The goal of this example is to prove that 2 X and (Z 2 )* 
are isomorphic rings. 

The basic idea is to associate every subset A C X with its characteristic 
function /a e (Z 2 ) X , defined by 

1 if x e A 

0 if x $ A. 


The characteristic function 
/a is sometimes called the 
indicator function , for it tells 
you whether an element 
x e X is or is not in A. 


We claim that (p : 2 X —> (Z 2 ) , defined by 

<P(A) = /a, 



This example is rather 
dense. It’s a good idea 
to pick a concrete set, 
say X = {1, 2, 3}, and 
work out the characteristic 
function for each of the 8 
subsets of X. 


is an isomorphism. 
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Recall the different mean- 
ings of + in this equation: 
A + B is symmetric differ- 
ence, and (p{A) + tp{B) is 
pointwise addition. 


First, (p is a bijection: 

(i) (p is injective : if A, B C X and cp(A) = < p(B), then /a = j b ■ for any 
x € X, we have Ja(x) = 1 if and only if /g(x) = 1. Thus, x e A if and 
only if x e B ; that is, A = B. 

(ii) (p is surjective : given a function g : X —>■ (Z 2 ) X , define A[g] C X by 

A[g\ = {x € X : g(x) = 1}. 

It is easy to check that <p(A[g\) = fA[g] = g- 
Finally, <p is a homomorphism: 

(i) < p maps the identity to the identity : the (multiplicative) identity of 2 X is X, 
and cp(X ) = fx . Now fx(x ) = 1 for all x e X, because every element 
of X lies in XI Hence, fx is the (multiplicative) identity in (Z 2 ) X ■ 

(ii) < p preserves addition: we must show that 

cp(A + B ) = (p(A) + <p(B) 
for all A, B C X. Consider the following table: 


fi(x) 

Jb(x) 

./ 1 < v) + f B (x) 

1 

1 

0 

1 

0 

1 

0 

1 

1 

0 

0 

0 


It follows that /a + Jb = /a+B' for each of the functions /a + /b 
and {a+b has value 1 if x e (A U B) — (A IT B) and value 0 otherwise. 
Therefore, 

V(A + B) = /a+b 

= /a + Jb 
= <p(A) + <p(B). 


(iii) (p presences multiplication: we must show that <p(A B) = (p(A)(p(B ) for 
all A, B C X. The proof is similar to that in part (ii), using a table for 
/ a/b to prove that /a/b = J Ab\ you will supply the details in Exer- 
cise 5.39 on page 212. 

We conclude that 2 X and ( Zi 2 ) X are isomorphic. In Exercise 5.40 on page 212, 
we will see that if \X\ = n, then ( Z2) X = (Z2)", the ring of all 77-tuples 
having coordinates in Z2 with pointwise operations. ▲ 


How to Think About It. There are two strategies in trying to show that a 
homomorphism (p: R — > S is an isomorphism. One way is to show that <p is 
a bijection; that is, it is injective and surjective. A second way is to show that 
the inverse function cp~ x : S R exists (see Exercise 5.30 on page 211 and 
Exercise 5.39(ii) on page 212). 


Here are some properties of homomorphisms. 
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Lemma 5.17. Let R and S be commutative rings, let tp: R — > S be a homo- 
morphism, and let a e R. 

0) <P(0) = 0. 

(ii) <p(—a) = —(p(a). 

(iii) <p(na ) = n<p{a) for all n e Z. 

(iv) (p(a n ) = (p{a) n for all n G N. 

(v) If a is a unit in R, then cp(a) is a unit in S, and cp(a~ 1 ) = < p(a)~ x . 

Proof, (i) Since 0 + 0 = 0, we have cp( 0 + 0) = cp(0) + cp(Q) = cp( 0). Now 
subtract <p(0) from both sides. 

(ii) Since 0 = — a + a, we have 0 = cp(0) = <p(—a ) + tp(a). But Propo- 
sition 4.35 says that negatives are unique: there is exactly one s e S, 
namely, s = —(p(a), with s + cp(a) = 0. Hence, tp(—ci) = —tp(a). 

(iii) If n > 0, use induction to prove that <p(na ) = mp(a). Now use (ii) : 
< p(—na ) = —( p(na ) = —tnp{a). 

(iv) Use induction to show that (p(a n ) = tp(ci) n for all n > 0. 

(v) By Proposition 4.35, there is exactly one b e R, namely, b = a ~ 1 , with 

ab = 1. Similarly in S; since tp{a)tpfb ) = cp(ab ) = ^(1) = 1, we have 
cp(a~ 1 ) = (p(a)~ l . ■ 


Example 5.18. If tp\ A — > B is a bijection between finite sets A and B, then 
they have the same number of elements. In particular, two finite isomorphic 
commutative rings have the same number of elements. We now show that the 
converse is false: there are finite commutative rings with the same number of 
elements that are not isomorphic. 

Recall Exercise 4.55 on page 165: there is a field, F4, having exactly four 
elements. If a e F4 and a 0, then a 2 0, for F4 is a domain (even 

a field), and so the product of nonzero elements is nonzero. Suppose there 
were an isomorphism cp: F4 — > Z4. Since cp is surjective, there is a e F4 
with < p(a) = 2. Hence, cp(a 2 ) = (p(a) 2 = 2 2 = 0. This contradicts tp being 
injective, fora 2 7^ 0 and tp{a 2 ) = 0 = cp( 0). A 


Exercises 

5.29 Let R and S be commutative rings, and let (p: R — > S be an isomorphism. 

(i) If R is a field, prove that S is a field. 

(ii) If R is a domain, prove that S is a domain. 

5.30 * 

(i) If (p is an isomorphism, prove that its inverse function ip^ 1 : S — > R is also an 
isomorphism. 

(ii) Show that tp is an isomorphism if and only if (p has an inverse function 1 p~^ . 

5.31 (i) Show that the composite of two homomorphisms (isomorphisms) is again a 

homomorphism (an isomorphism). 

(ii) Show that R = S defines an equivalence relation on the class of all commu- 
tative rings. 
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5.32 Prove that the weird integers W (see Exercise 4.35 on page 158) is not isomorphic 
to Z. 

5.33 Recall that Z[co\ = {a + ba> : a, b 6 Z}, where a> = — ^ + i^j-- Show that 
(p : Z[co] — Z[<w], defined by 

(p:a + but a + ba> 2 , 

is a homomorphism. Is <p an isomorphism? 

5.34 If R is a commutative ring and a e R, is the function (p: R — > R, defined by 
<p: r ar, a homomorphism? Why? 

5.35 Prove that two fields having exactly four elements are isomorphic. 

Hint. First prove that 1 + 1 = 0. 

5.36 * Let k be a field that contains Z p as a subfield (e.g„ k = Z p (x)). For every 
integer n > 0 , show that the function (p n :k -*■ k, given by y+(fl) = a p " , is an 
injective homomorphism. If k is finite, show that (p n is an isomorphism. 

5.37 * If R is a field, show that R = Frac(R). More precisely, show that the homo- 
morphism i p: R Frac(R), given by i p: r h* [r, 1], is an isomorphism. 

5.38 * Recall, when we constructed the field Frac(D) of a domain D, that [a, b] de- 
noted the equivalence class of ( a.b ), and that we then reverted to the usual nota- 
tion: [a, b] = alb. 

(i) If R and S are domains and <p\ R S is an isomorphism, prove that 

[a,b] ih>- [i p(a ), i p(b)\ 
is an isomorphism Frac(R) —*■ Frac(5). 

(ii) Prove that a field k containing an isomorphic copy of Z as a subring must 
contain an isomorphic copy of Q. 

(iii) Let R be a domain and let <p: R — »• k be an injective homomorphism, where 
k is a field. Prove that there exists a unique homomorphism $: Frac(R) - 4 - k 
extending <p\ that is, $|R = <p. 

5.39 * In Example 5.16, we proved that if X is a set, then the function^: 2 X — *■ (Z 2 ) X , 
given by (p(A) = j 4 , the characteristic function of A, is an isomorphism. 

(i) Complete the proof in Example 5.16 by showing that (p(AB) = <p(A) 1 p(B) 
for all A,Be 2 X . 

(ii) Give another proof that 2 X = (Z 2 ) X by showing that ip _1 exists. 

5.40 * If n is a positive integer, define (Z 2 )" to be the set of all «-tuples (fli a n ) 

with aj e Z 2 for all i (such n -tuples are called bitstrings). 

(i) Prove that (Z 2 )" is a commutative ring with pointwise operations 

(ai,...,a n )+ ( b\ b„) = {a\ + b\ a n + b„) 

and 

= (. a\b\ a n b n ). 

(ii) If X is a finite set with |3f| = n , prove that (Z 2 )^ = (Z 2 )". Conclude, in 
this case, that 2 X = (Z 2 )" (see Example 5.16). 
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Extensions of Homomorphisms 


If <I>: C/ -»• Y is any 
function, then its restriction 
<f>|X to a subset X c U 
is equal to the composite 
<£>/', where i: X ->• U is the 
inclusion. 


Some obvious questions about extensions are 

(i) Can we extend <p: R — > 5 to a homomorphism O: E — »• S I 

(ii) Can we extend <p: R — > S[.\] to a homomorphism O: E — > S [a] ? 

Theorems 5.19 and 5.20 below answer the first question when E = f?[x] 
and E = R[x\, . . . ,x n \, Corollary 5.22 answers the second question when 
E = /?[*]. The basic idea is to let <p handle the elements of R, then specify 
what happens to x, and then use the definition of homomorphism to make sure 
that the extension preserves addition and multiplication. 

Even though the coming proof is routine, we give full details because of the 
importance of the result. 

Theorem 5.19. Let R and S be commutative rings, and let (p: R — > S be a 
homomorphism. If s G S, then there exists a unique homomorphism 

<&:/?[*] -* S 

with <J>(x) = s and ^(r) = (p(r)forall r e R. 

Proof. If / (x) = J2i r i x ' = r o + r i x 5 f r nX n , define <!> : [x] — > S by 

**>(/) = <p(r 0 ) + (p(ri)s -\ f (p(r n )s n . 

Proposition 5.6, uniqueness of coefficients, shows that O is a well-defined 
function, and the formula shows that O(x) = s and O(r) = ip(r ) for all 
r e R. 

We now prove that <f> is a homomorphism. First, C E ) (1) = <p{\) = 1, because 
(p is a homomorphism. 

Second, if g{x) = ao + ct\x -\ + a m x m , then 

<*» (/ +g) = * + a,)x l 

= ^2 (pin + ai)s l 
i 

= X! + v ( a >■ ))' s ' ^ 

i 

= ^ <p( a i)s l 

i i 


Suppose that a ring R is a subring of a commutative ring E with inclusion 
i: R — > E. Given a homomorphism <p : R — > S, an extension <!> of tp is a 
homomorphism <I>: E — > S with restriction T| R = <!>/ = <p. 


E 



= <S>(f) + 
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= J2 I E vinMcij) 

k \i+j=k 


On the other hand. 


$(/)$(<?) = ( j ( y "1 <?(«./ V = E E (p(rj)(p(cij) J s k . 

\ i ' \ j ) k \i+j=k / 

Uniqueness of 4) is easy: if 0: i?[x] — > 5 is a homomorphism with 0 (a ) = s 
and @(r) = < p(r) for all r e R, then 

0(r o + nx H 1 -r d x d ) = (p{r 0 ) + (p(ri)s 4 1- (p(r d )s d 

= <J>(ro + nx H 1- r d x d ). ■ 

This theorem generalizes to polynomial rings in several variables. 


Theorem 5.20. Let R and S be commutative rings and cp: R — > S a homo- 
morphism. If s i, . . . , s„ e S, then there exists a unique homomorphism 

<£>: R[x\ x„] S 

with <t>(xi) = Sj for all i and <J>(r) = <p(r) for all r € R. 


Proof The proof is by induction on n > 1. The base step is Theorem 5.19. For 
the inductive step, let n > 1 and define A = . . . , x n -\]. The inductive 

hypothesis gives a homomorphism ijr.A—tS with f(xj) = Sj for all i < 
n — 1 and i// ( r ) = < p(r) for all r e R. The base step gives a homomorphism 
> S with 'f'(Xn) = s n and 'f'(o) = i fr(a) for all a e A. The 
result follows, because R[xi , . . . , x n \ = A[x„], ^(x,) = fix,) = s ; for all 
i <n — 1, A>(x„) = f(x n ) = s n . and Tr(r ) = i jr(r) = tp{r) for all r e R. ■ 


How to Think About It. There is an analogy between Theorem 5.20 and 
an important theorem of linear algebra. Theorem A. 43 in Appendix A. 4: Let 
V and W be vector spaces over a field k\ if iq, . . . , v n is a basis of V and 
Wi, ... ,w n € W. then there exists a unique linear transformation T : V W 
with T(vf) = w, for all i (linear transformations are homomorphisms of vec- 
tor spaces). The theorem is actually the reason why matrices can describe linear 
transformations. 
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Here is a familiar special case of Theorem 5.19. 

Definition. If R is a commutative ring and a € R, then evaluation at a is the 
function e a : — > R given by e a (f) = /(a); that is, 



So, in this language, we have: 

Corollary 5.21. If R is a commutative ring and a € R, then e a \ — > R, 

evaluation at a , is a homomorphism. 

Proof. In the notation of Theorem 5.19, set R = .S', (p = 1 r (the identity 
function R — > R), and s = a G R. The homomorphism O: R[x] — > R is e a , 
which sends r\x l into rja 1 . ■ 

As an illustration of Corollary 5.21, if /, g e /?[*] and h = fg, then 

h(a) = e a (h) = e a (f)e a (g) = f(a)g{a). 

In other words, we get the same element of R if we first multiply polynomials 
in /?[x] and then substitute a for x, or if we first substitute a for x and then 
multiply the elements / (a) and g(a). For example, if R is a commutative 
ring and a e R. then fix) = q(x)g(x) + r(x) in R[x] implies f(a) = 
q(a)g(a) + r(a ) in R. 

Let’s return to question (ii) on page 213. Given a homomorphism 
<p : R -x .S', can we extend it to a homomorphism f?[v] — > .S' [.v] ? The basic 
idea? Let <p handle the coefficients and send x tor. 

Corollary 5.22. If R and S are commutative rings and (p: R — > S is a ho- 
momorphism, then there is a unique homomorphism (p*: f?[.r] — > S [x] given 
by 


(p*:r 0 + r\x + r 2 x 2 -\ m- (p(r 0 ) + <p(r \)x + <p(r 2 )x 2 . 

Moreover, cp* is an isomorphism ifcp is. 

Proof. The existence of the homomorphism cp* is a special case of Theo- 
rem 5.19. More precisely, consider the following diagram in which t: R — > 
/s’ [ a] and X: S — > S[.t] are the usual inclusions viewing elements of R and 
of S as constant polynomials. The role of < p: R — > S is now played by the 
composite Xcp: R —> S[.\], namely, r !->• (<p(r), 0, 0 ). 



S[x]. 


If f* is the polynomial 
function determined by /, 
then e a (f) = f*(a). 


If <p is an isomorphism, then O 1 is the inverse of the extension of < p 1 . ■ 
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No calculus is needed for 
this exercise. 


Example 5.23. If r m : Z — > Z m is reduction mod m, that is, r m (a) = [a\, then 
the homomorphism r^: Z[x] —> Z m [x\ reduces each coefficient of a polyno- 
mial mod m : 

T ClQ + Cl\X + Cl2X 2 + •••!—>■ [flo] + [r/ 1 ] A' + [fl2]ft ' 2 + • • • . 

We will usually write a instead of [a\ when using r ^ . ▲ 

Example 5.24. Complex conjugation extends to an isomorphism C[x] — > C[x] 
in which every polynomial is mapped to the polynomial obtained by taking the 
complex conjugate of each coefficient. (We have already used this construction 
in Theorem 3.12 and in Exercises 3.43-3.45 on page 106.) ▲ 


Exercises 

5.41 If R is a commutative ring, prove that R\x, y] = R [y , x]. In fact, prove that there 
is an isomorphism $ with $(x) = y, <t>( v) = x, and 4>(r) = r for all r e R. 

Hint. Use Theorem 5.20. 

If you look very carefully at the definitions, you’ll see that R[x,y\ and R[y, x] 
are different rings. Recall that elements a in a ring A correspond to (a,0 , . . .) 
in A [x] . In particular, the element x 6 R [x] corresponds to (x, 0, 0, . . .) in R [x] [ y] ; 
that is, we have x = ((0. 1. 0. . . . ), 0. 0. . . .) so that in R[x][y] the element x has 
(0, 1. 0. ... ) in coordinate 1. This is not the same element as x in R [ y ] [x] , which 
has 1 sitting in coordinate 1. However, this exercise allows you to relax and regard 
these polynomials rings as the same. 

5.42 * 

(i) If R is a commutative ring and c 6 R, prove that there is a homomorphism 
( p : R[x] -*■ /?[x] with <p(x) = x + c and i p(r) = r for all r e R: that is, 
<p(^2i nx' ) = Yli r i ( x + C Y ■ I s <P an isomorphism? 

(ii) If deg(/) = n, show that 

• rtf) = f(c)+ f'(c)(x + c)+ Y——l(x + c) 2 + ••• + - — | — (x + c)", 

2 ! n ! 

where f(x) is the formal derivative of / definedin Exercise 5.15 on page 202. 

Kernel, Image, and Ideals 

There’s a great deal of talk about “modeling” in high school mathematics: we 
wonder whether a given statement is true, and the idea is to see whether it 
holds in some model. A homomorphism R — »• S is a good illustration of this 
idea; it transports the structure of R to the structure of S, so that we may test 
whether a statement in R is true by asking whether its analog in the “model” 
S is true. For example, is —1 a square in Z; is there k e Z with — 1 = k 2 l 
Now we can list all the squares in Z 3 : 0 2 = 0; l 2 = 1; 2 2 = 4 = 1, and we 
see that —1 = 2 is not a square. But if — 1 = k 2 in Z, then reduction mod 3, 
the homomorphism ry. Z — > Z 3 taking a [a] (see Example 5.14(i)), would 
give 

r 3 (-l) = r 3 (k 2 ) = r 3 (k) 2 , 


contradicting —1 not being a square in Z 3 . 
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We should be cautious when viewing a homomorphism tp\ R — > S as mod- 
eling a ring R. First, <p may not give a faithful model of R, thereby losing some 
information; for example, (p might take different elements of R to the same el- 
ement of S . Also, some information might get missed: there may be elements 
of S that don’t get “hit” by an element of R. The information that’s lost is 
called the kernel of <p\ the information that’s hit is called its image. 

Definition. If <p: R — > S is a homomorphism, then its kernel is 
ker (p = {a g R with < p(a) = 0} c R, 


and its image is 

mup = {5 € S : s = cp(a) for some a e R} c S. 

Here are the first properties of these subsets. Note that Lemma 5.17 says, 
for every homomorphism 1 i? — >• S, that 0 G keitp and 0 G 'mup. In general, 
mup is a subset of S but, as for any function, (p is surjective if and only if 
mup = S. 


Proposition 5.25. Let R and S be rings and tp\ R — >• S a homomorphism. 

(i) mup = {(p(r) : r G R} is a subring of S. 

(ii) If a, b G ker^>, then a + b e ker <p. 

(iii) If a G ker <p and r G R, then ra G ker <p. 

Proof, (i) To see that mup is a subring of S, note first that 1 G mup, because 
tp(\) = 1 . If s, t G mup , then there are a.b G R with s = cp(a) and 
t = (p(b). Hence, s + t = <p(a) + cp(b ) = (p{a + b) G xmcp. and 
st = (p(a)cp(b) = tp{ab) g 'mup. Therefore, mup is a subring of S . 

(ii) If a.b G ker tp. then cp(a) = 0 = tp(b). Hence, tp(a + b) = tp(ct)+tp(b ) = 

0 + 0 = 0, and a + b G ker <p. 

(iii) If a G ker cp. then cp(a) = 0. Hence, 1 p(ra) = (p{r)(p{a ) = <p(r ) -0 = 0, 
and so ra G ker cp. ■ 

Here are some examples of kernels and images. 

Example 5.26. (i) If cp: R — > S is an isomorphism, then ker< p = {0} and 
mup = S. 

(ii) If tp is injective, then ker tp = {0}, for if r 0, then < p(r) 7 ^ (p(0) = 0. 
We will soon see that the converse is true, so that <p is injective if and only 
ker <p = { 0 }. 

(iii) If r m :Z — > Z m is reduction mod m, then ker r m consists of all the multi- 
ples of m. 

(iv) Let k be a commutative ring, let a G k. and let e a :k[x] — > k be the 
evaluation homomorphism / ( x ) m- / (a). Now e a is always surjective; if 
b G k. then b = e a {f). where / (x) = x — a + b. By definition, kere a 
consists of all those polynomials g for which g(a) = 0 . 

In particular, let tp: K[v] — > C be defined by ip(x) = i and cp(a) = a 
for all a G E. Then ker cp is the set of all polynomials / ( x ) G R[x] having 

1 as a root. For example, x 2 + 1 G ker tp. k 
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Proposition 5.25 suggests that ker (p is a subring of R but, in fact, it almost 
never is because it usually doesn’t contain 1. The definition of homomorphism 
says that <p( 1) = 1. If 1 e kerip, then <p{\) = 0, and so 1 = 0 in S; that is, S 
is the zero ring. We conclude that if S has more than one element, then ker< p 
is not a subring of R. However, kernels are always ideals. 

Definition. An ideal in a commutative ring R is a subset I of R such that 

(i) 0 e / 

(ii) if a, b e /, then a + b e / 

(iii) if a € I and r e R. then ra e I . 

An ideal I ^ R is called a proper ideal. 

The ring R itself and {0}, the subset of R consisting of 0 alone, are always 
ideals in a commutative ring R. Proposition 5.25 says that the kernel of a ho- 
momorphism (p: R — »• S is always an ideal in R; it is a proper ideal if S is not 
the zero ring because 1 ^ ker<p. 

We have seen ideals in a completely different context. Theorem 1.19, which 
says that gcd(a, b) is a linear combination of a, b, involved showing that the set 
of all linear combinations is an ideal in Z. Indeed, Exercise 1.49 on page 30 
makes this explicit (of course, we had not introduced the term ideal at that 
time). 


Etymology. As we said on page 131, a natural attempt to prove Fermat’s Last 
Theorem involves factoring x p + y p in the ring Z[£ p ] of cyclotomic integers, 
where t, p is a /;th root of unity. In Chapter 8, we shall sketch the ideas that 
show that if this ring has unique factorization into primes, that is, if the analog 
of the Fundamental Theorem of Arithmetic holds in Z[£p], then there are no 
positive integers a, b, c witha^ +b p = c p . For some primes p, such an analog 
is true but, alas, there are primes for which it is false. In his investigation of 
Fermat’s Last Theorem, Kummer invented ideal numbers in order to restore 
unique factorization. His definition was later recast by Dedekind as the ideals 
we have just defined, and this is why ideals are so called. 


It is very easy to check 
that ( b\,b 2 ,...,b n ) is an 
ideal. 


The principal ideal ( b ) is 
sometimes denoted by Rb. 


Here is a construction of ideals that generalizes that which arose when we 
studied gcd’s. Recall that a linear combination of elements b\ , b 2 , • • ■ , b„ in a 
commutative ring R is an element of R of the form 

>'ibi + r 2 b 2 H b r n b n , 

where r, e R for all i . 

Definition. If b\. b 2 , . . . , b n lie in a commutative ring R. then the set of all 
linear combinations, denoted by 

0 bi,b 2 ,...,b n ), 

is an ideal in R. called the ideal generated by b\, b 2 , . . . , b n . In particular, if 
n = 1 , then 

(b) = {rb : r € R} 

consists of all the multiples of Z? ; it is called the principal ideal generated by b. 
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Both R and {0} are ideals; indeed, both are principal ideals, for R = (1) 
and, obviously, {0} = (0) is generated by 0. Henceforth, we will denote the 
zero ideal {0} by (0). 

Example 5.27. (i) The even integers comprise an ideal in Z, namely, (2). 

(ii) Proposition 5.25 says that if <p\ R — > S is a homomorphism, then kerip is 
an ideal in R. In particular, we can generalize part (i): if r m : Z — > Z m is 
reduction mod m, then ker r m = (m). 

(iii) If I and J are ideals in a commutative ring R, then it is routine to check 
that / fl J is also an ideal in R. More generally, if ( Ij)jej is a family 
of ideals in a commutative ring R, then H/g/ h ' s an ideal in R (see 
Exercise 5.53 below). 

(iv) By Example 5.26(iv), the set /, consisting of all polynomials fix) in 

TR[x] having i as a root, is an ideal in R[x] containing x 2 + 1 (it is the 
kernel of the evaluation e, ). We shall see, in Corollary 6.26, that / = 
(x 2 + 1). ▲ 

Example 5.28. Let R be a commutative ring. For a subset A of R, define 

/ = 1(A) = {f(x) G /?[*] : f(a) = 0 for alio G A}. 

It is easy to check that / is an ideal in /?[*]. Clearly, 0 G /.If/ G / andr G R, 
then (rf) # = r/ # , and so ( rf)(a ) = r ( f(a )) = 0 for all a G A. Finally, if 
/ g e /. then (/ + g) # = f # + g\ so that (/ + g) # : a h* f(a) + g(a) = 0 
for all a G A, and / + g G I . Therefore, / is an ideal. (Alternatively, show 
that 1(A) = f) aeA kere a , where e a is evaluation at o, and use Exercise 5.53 
below that says the intersection is an ideal.) 

In the special case when R is a field, then 1(A) is a principal ideal. If A is 
finite, can you find a monic d(x) with 1(A) = (d)l What if A is infinite? ▲ 

Theorem 5.29. Every ideal / in Z is a principal ideal. 

Proof. If I = (0), then / is the principal ideal with generator 0. If / ^ (0), 
then there are nonzero integers in /; since a G / implies —a G /, there are 
positive integers in / ; let cl G / be the smallest such. Clearly, (d ) C I . For 
the reverse inclusion, let b G I . The Division Algorithm gives q. r G Z with 
b = qd + r, where 0 < r < d . But r = b — qd G I . If r / 0, then its 
existence contradicts d being the smallest positive integer in / . Hence, r = 0, 
d | b,b G / , and / C (d ). Therefore, / = (cl). ■ 

We’ll see in the next chapter that there are commutative rings having ideals 
that are not principal ideals. 

Example 5.30. (i) If an ideal / in a commutative ring R contains 1, then 

I = R. for now I contains r = r 1 for every r G R. Indeed, if I contains 
a unit u, then I = R. for then I contains u~ l u = 1. 

(ii) It follows from (i) that if R is a field, then the only ideals I in R are (0) 
and R itself: if / f (0), it contains some nonzero element, and every 
nonzero element in a field is a unit. 

Conversely, assume that R is a nonzero commutative ring whose only 
ideals are R itself and (0). If a G R and a f 0, then (a) = {ra : r G R} 
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is a nonzero ideal, and so (a) = R; but 1 e R = (a). Thus, there is r e R 
with 1 = ra; that is, a has an inverse in R, and so R is a field. ▲ 

Proposition 5.31. A homomorphism <p: R — > S is an injection if and only if 
ker < p = (0). 

Proof. If <p is an injection, then a 0 implies cp(a) cp( 0) = 0. Hence, 

ker< p = (0). Conversely, assume that kerip = (0). If cp(a) = cp(b), then 
<p(ct — b) = cp(a) — cp(b) = 0; that is, a — b e ker< p = (0). Therefore, a = b 
and if) is an injection. ■ 

Corollary 5.32. Ifk is afield and (p'.k — > S is a homomorphism , where S is 
a nonzero commutative ring, then ip is an injection. 

Proof. The only proper ideal in k is (0), by Example 5.30; now apply Propo- 
sition 5.31. ■ 


Exercises 


The notation c means 
“is a proper subset of” 

(in contrast to c which 
indicates a subset which 
may or may not be proper). 


5.43 Construct a homomorphism from Z [i ] — ► Z [/ ] that has i in its kernel. What is the 
entire kernel? 

5.44 Find the kernel of the homomorphism Q[x] -» Q[\/2] defined by / h* /(V 2), 
where Q[\/2] = {a + b \fl : a.b e Q}. 

5.45 Show that the kernel of the evaluation homomorphism e a in Corollary 5.21 is the 
set of polynomials in _R[.v] that have a as a root. 

5.46 Consider the set / of polynomials in R[.v] that vanish on the set {3 ± s/5, 5 ± s/l}. 
Show that / is a principal ideal in R[.v], 

5.47 * Find three ideals (a) in Z with the property that 

(24) c (a). 

5.48 * Suppose a and b are integers. Show that a \ b if and only if ( b ) C (a). 

5.49 * If a, b e Z, prove that (a) fl ( b ) = (m), where m = lcm(a, b). 

5.50 * Define the sum of ideals I and J in a commutative ring R by 

I + J = {u + v:ugI and v e J}. 


(i) Prove that I + J is an ideal. 

(ii) If a, b 6 Z, prove that (a) + ( b ) = (a,b) = ( d ), where d = gcd(a, b). 

5.51 * Define the product of ideals I and J in a commutative ring R by 


IJ = {a\b\ + • • • + a n b n : a,- 6 I, bj e J,n > 1}. 


(i) Prove that I J is an ideal in R. 

(ii) Prove that if I and J are principal ideals, then IJ is principal. More precisely, 
if / = (a) and J = (b), then IJ = (ab). 

(iii) If / = (a i , . . . , a s ) and J = (b\ bf), prove that 
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5.52 * Let I, J , and 0 be ideals in a commutative ring R. 

(i) Prove that I J = J 1 . 

(ii) Prove that RI = I . 

(iii) Prove that I(JQ ) = (I J)0 . 

5.53 * If (Ij)j e j is a family of ideals in a commutative ring R, prove that (~]j € j Ij 
is an ideal in R. 

5.54 * 

(i) If R and S are commutative rings, show that their direct product Rx S is also 
a commutative ring, where addition and multiplication in R x S are defined 
coordinatewise: 

(r, s) + (/, s') = ( r + r',s + s') 

and 

(r,s)(r',s') = (rr',ss'). 

This construction generalizes that of (Z 2 )” in Exercise 5.40 on page 212. 

(ii) Show that R x S is not a domain. 

(iii) Show that R x (0) is an ideal in R x S. 

(iv) Show that R x (0) is a ring isomorphic to R, but it is not a subring of Rx S. 

(v) Prove that Z(, = Z 2 x Z 3 . 

(vi) Show that Z 4 ^ Z 2 x Z 2 . 

(vii) Prove that Z mn = Z m x Z n if m and n are relatively prime. 

Hint. Use the Chinese Remainder Theorem. 

5.55 If R R „ are commutative rings, define their direct product R[ '/. ■■■'/. R„ 

by induction on n >2 (it is the set of all n -tuples (n , . . . , r n ) with r , e Rj for 
all i ). Prove that the ring (Z) x in Example 5.16. where X is a set with \X \ = n, 
is the direct product of n copies of Z 2 . 

5.56 (i) Give an example of a commutative ring R with nonzero ideals I and J such 

that I n J = (0). 

(ii) If I and J are nonzero ideals in a domain R, prove that / fl i / (0). 

5.57 Let F be the set of all 2 x 2 real matrices of the form 



(i) Prove that F is a field (with operations matrix addition and matrix multipli- 
cation). 

(ii) Prove that ip\ F — > C, defined by <p(A) = a + ib, is an isomorphism. 

5.4 Connections: Boolean Things 

In some high school programs. Boolean Algebra is called the “algebra of sets;” 
it usually focuses on establishing set-theoretic identities like 


A n (B u C) = (A n B) u (A n C) 




222 Chapter 5 Abstract Algebra 


Recall: if U, V are subsets 
of X, then U -V = 

{x € X : x 6 U and x 
V}. 


In Figure 5.2, 4 + B is the 
shaded region, AB is the 
unshaded region. 


for subsets A, B, and C of a set X. Such formulas are proved by showing that 
an element lies in the left-hand side if and only it lies in the right-hand side. 

Exercises 4.68 through 4.74 on page 169 gave you practice in doing this 
sort of thing, but they actually showed more. Recall Example 4.47: if 2 X is 
the family of all the subsets of a set X, then 2 X is a commutative ring with 
addition defined as symmetric difference, 

A + B = (A - B) U (B - A) = All B - (A O B), 
and multiplication defined as intersection, 

AB = A n B. 

It follows, for all subsets A of X, that 

A 2 = A, A + 0 = A, A + A = 0, and AX = A; 

the identity element 1 is the subset X itself. It follows from A + A = 0 that ev- 
ery A e 2 X is its own additive inverse; that is, A = —A. Indeed, Exercise 5.58 
on page 226 says that 1 = —1 in 2 X . Since we often pass back and forth be- 
tween the commutative ring 2 X and set theory, we say out loud that a minus 
sign will be used in set theory, as in the definition of symmetric difference, but 
it shall never be used when we are working in 2 X viewed as a ring. 

We are going to show that calculations in the ring 2 X give more satisfy- 
ing proofs of set-theoretic identities; thus, regarding all subsets as forming a 
commutative ring is a definite advantage. Another goal is to use the calcula- 
tions to establish the inclusion-exclusion principle, a very useful technique in 
counting problems. 

Venn diagrams are visual representations in the plane of relationships among 
subsets in X. They convert words into pictures. For example, symmetric dif- 
ference and intersection are illustrated by the Venn diagram in Figure 5.2. 

Some standard words occurring in set theory, actually in logic, are NOT, 
AND, OR, and EXCLUSIVE OR. If we picture a statement a as the inside of 
a region A in the plane, then the Venn diagram of “NOT a” is the outside of A; 
it is the complement 

A c = {x € X : x £ A}. 

Exercise 4.69 on page 168 says that A c = X + A. If a and b are statements, 
then the Venn diagram of the statement “a AND b” is the intersection A IT B, 
while the diagram of “ a OR A” is the union A U B. EXCLUSIVE OR is the 
symmetric difference A + B\ it pictures the statement “ a OR b but not both” 
(as in the statement “Take it or leave it!”). 

The next result is Exercise 4.73 on page 169; you probably solved this ex- 
ercise then using elements, as we now do. 



Figure 5.2. 4 + 5 and 45. 
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Proposition 5.33 (De Morgan). If A and B are subsets of a set X, then 

(A u by =A c n B c . 

Proof. We first show that (i U S) c C A c n B c . If x G (A U BY, then 
x f A U B . But A U B consists of all elements in A or in B . So, x f. A U B 
implies x f A and x f B\ that is, x G A c fl B c . 

For the reverse inclusion, take x G A c n B c . Hence, x € A c and x G B c ; 
that is, x ^ A and x f B. Thus, x f. A U B. and x G (A U BY- ■ 

This proof is not very difficult, but it’s also not very satisfying. The reason- 
ing very much depends on the meanings of the connectives NOT, AND, and 
OR, as do the definitions of union and intersection. If feels as if we are just 
playing with words. 

We are going to give a second proof, more in the spirit of commutative 
rings, that uses the binary operations in a special kind of commutative ring that 
distills the distinguishing feature of 2 X into one property. 

Definition. A Boolean ring is a commutative ring R in which a 2 = a for all 
a G R. 

Example 5.34. (i) The ring 2 X of subsets of a set X , in Example 4.47, is a 

Boolean ring. 

(ii) If A is a set, possibly infinite, then the family R of all finite subsets of X 
together with X itself is a Boolean ring with operations symmetric differ- 
ence and intersection. A 

Let’s extend familiar facts in 2 X to arbitrary Boolean rings R. Some of these 
calculations might look strange; just keep 2 X and Venn diagrams in the back of 
your mind as you work through them. For example, the following definitions, 
inspired by the particular Boolean ring 2 X , make sense in any Boolean ring. 

Complement: a' = 1 + a (see Exercise 4.69 on page 168) 

Union: a V b = a + b + ab (see Exercise 4.74(ii) on page 169) 

Disjoint: ab = 0 

Lemma 5.35. Suppose that R is a Boolean ring and a G R. Then: 

(i) a + a = 0 

(ii) aa' = 0 

(iii) a' + a = 1. 

Proof (i) 


a + a = (a + a)(a + a) 

= a 2 + a 2 + a 2 -I- a 2 
= Cl -\- a -\- d -|- Cl . 

Now subtract a + a from both sides to obtain a + a = 0. 

(ii) aa' = a(l +a)=a+a 2 = a+ a = 0. 

(iii) a' -|- a = (1 -|- a ) ~\~ a = 1 H- (a -|- a ) = 1 H- 0 = 1. H 




224 Chapter 5 Abstract Algebra 


Work though these proofs 
for yourself, justifying 
each step. Notice how 
the particulars of 2 X are 
fading into the background. 


The proof that 
1 + (a + b + ab) = 

(1 + a)(l + b) could be an 
exercise in any first-year 
high school algebra text. 


Proposition 5.36. Let R be a Boolean ring and a,b € R. 

(i) a + b = ab' + a'b, and the summands ab' and a'b are disjoint. 

(ii) a V b = ab' + a'b + ab, and the summands ab' , a'b, and ab are pairwise 
disjoint. 

Proof, (i) For all x e R, x + x = 0, xx' = 0, and x + x' = 1. Hence, 

a + b = a(b + b') + b(a + a') = ab + ab' + ab + a'b = ab' + a'b. 
The summands are disjoint, because (ab')(a'b) = 0. 

(ii) 


a\/b = a + b + ab = ab' + a'b + ab. 

The summands are disjoint, for part (i) shows that ab' and a'b are disjoint, 
while (ab')ab = 0 = (a'b)ab. ■ 

Let’s now see how working in an arbitrary Boolean ring reduces the proofs 
about facts in specific such rings like 2 X to algebraic calculations. Compare 
the set-theoretic proof of Proposition 5.33 with the following proof. 

Proposition 5.37 (De Morgan = Proposition 5.33). If A and B are subsets of 
a set X, then 


(. A U B) c = A c n B c . 

Proof. We first work in a Boolean ring R and then pass to 2 X . 

If a , b € R, we want to show that ( a V b )' = a'b' . But 

(a v b)' = 1 + a v b = 1 + (a + b + ab), 

which is equal to a'b' = (1 + a)( 1 + b). 

Now interpret this general result in R in the particular Boolean ring 2 X , 
using the translations A V B = A U B, AB = A IT B, and 1 + A = A c . ■ 

There’s another De Morgan law in Exercise 4.73 on page 169. Algebra to 
the rescue. 

Proposition 5.38 (De Morgan). If A and B are subsets of a set X, then 

(A n BY = A c U B c . 

Proof. Let R he a Boolean ring and a. b e R. We want to show that 

(ab)' = a' v b' . 

The idea again is to first use “pure algebra,” reducing everything to statements 
about addition and multiplication in R , and then translate the result into the 
language of 2 X . Now (ab)' = 1 + ab, and 

a' v b' = a' -(- b' 4- a'b' = (1 -|- a) (1 -|- b) (1 -|- a)( 1 + b). 
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Calculate: 

(lTr/)-l-(lTZ?)T(lTn r )(lT/?) = 1 d - T - \ + b + \ a b ab 

= (l + l) + (tt + fl) 

+ (b + b) + (1 + ab ) 

= 1 + ab 
= (ab)'. m 

We now solve an earlier exercise using this point of view. 

Proposition 5.39 (= Exercise 4.70). Let A, S be subsets of a set X. Then 
S = A c if and only if A fl S = 0 and A U 5 = X. 

Proof. It suffices to work in a Boolean ring and then to see what it says in the 
particular Boolean ring 2 X . 

Assume that s = a' = 1 + a. Then 

as = a( 1 + a) = a + a 2 = a + a = 0, 


and 

flVs=flV(l + a) = a + (l + a) + a(l +a) = a + l+ a + a+a 2 = 1. 

Conversely, if as = 0 and a + s + as = 1, then a + s = 1. But —1=1 
in every Boolean ring, by Exercise 5.58 on page 226, and so s = 1 + a = a’. 


The usual distributive law in a commutative ring is a(b + c) = ab + ac. The 
proof that the equation holds in 2 X essentially follows from the set-theoretic 
identity 


A n (B u C) = (A n B) u (A n C). 

We are now going to show that interchanging fl and U gives another valid 
identity. 

Proposition 5.40. If A, B and C are subsets of a set X, then 

A U (B n C) = (A U B) n (A U C). 

Proof. We must show that ay be = (a V b)(a V c); that is, 

a + be + abc = (a + b + ab)(a + c + ac). 

Expand the right-hand side, remembering that x 2 = x and x+x = 0 for all x: 

(a + b + ab)(a + c + ac) = a 2 + ac + a 2 c + ab + be + abc 

+ a 2 b + abc + a 2 bc 
= a + be + abc. ■ 
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In Exercises 5.66 and 
5.67, we use minus signs 
in a Boolean algebra. 
Since -1 = +1, all these 
signs are really +, but 
this notation invites you to 
compare these formulas 
with the statement of 
Inclusion-Exclusion. 


Exercises 

5.58 * Prove that— 1 = 1 in every Boolean ring. 

5.59 * Proposition 5.40 proves that if A, B, and C are subsets of a set X, then 

A U (B n C) = (A U B) n (A U C). 

Give another proof using set theory. 

5.60 * If A, B i, B 2 . and B 3 are subsets of a set X, show that 

A n (Bi n b 2 n b 3 ) = (A n B x ) u (A n b 2 ) u (A n s 3 ). 

Generalize to A fl Bj). 

5.61 Let it be a ring in which multiplication is not assumed to be commutative (see 
the callout on page 156). If a 2 = a for every a e R, prove that R must be a 
commutative ring. 

5.62 (i) If A, B are subsets of a set X, prove that B — A = B fl A c . 

(ii) In any Boolean ring R , prove that b + a = b(\ + a) + a( 1 + b). 

5.63 Suppose that R is a Boolean ring and a, b e R. Show that 

a'b' = 1 — a — b + ab. 


where a' = 1 + a. 

5.64 Suppose that R is a Boolean ring and a, b e R. Show that 

1 — a'b' = aw b. 

5.65 Suppose that R is a Boolean ring and a, A e R. Show that 

aw (b v c) = (a v b) v c. 

5.66 Suppose that R is a Boolean ring and (a,- )” =1 is a collection of n elements in R. 
Show that 

n n 

1 _ n a 'i = v 

i = 1 i = 1 

5.67 Suppose that R is a Boolean ring and (a; )” =1 is a collection of n elements in R. 
Show that 

n 

J~J a'j = 1 — a\ + a\aj — • • • + ( — l)”aia2 . . . a n . 

i = 1 l<i<n 1 <i<j<n 

Hint, a' = 1 — a. 

5.68 Suppose that R is a Boolean ring and (a,- )” =1 is a collection of n elements in R. 
Show that 

n 

\J a\ = a\ — a/ay + • • • + (— l)” -1 ai<32 • • • • 

/ = ] 1 <i<n 1 <i<j<n 

5.69 In a Boolean ring, define a < b to mean a = ab. Viewing 2 X as a Boolean ring, 
prove that A < B in 2 X if and only if A C B. 
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5.70 An atom in a Boolean ring R is a nonzero element a e R with v < a if and only 
if .T = 0 or x = a. If R is a finite Boolean ring, prove that every x & R is a sum 
of atoms. 

5.71 (i) If R is a finite Boolean ring, prove that R = 2 X , where X is the set of all 

atoms in R. 

(ii) Take It Further. Let R be the Boolean ring of all finite subsets of an infinite 
set X (see Example 5.34(h)). Prove that R ^ 2 Y for any set Y. 

Hint. The simplest solution involves some set theory we have not discussed. 
If X is countable, then R is countable; however, if Y is any infinite set, then 
2 Y is uncountable. Hence, there is no bijection R — »■ 2 Y . 

Inclusion-Exclusion 

Suppose you have a class of students, all of whom take either French or Span- 
ish, but none of whom take both. If 15 students are studying French and 12 
students are studying Spanish, you have 15 + 12 = 27 students in your class. 
Denote the number of elements in a finite set A by 

Mi- 

Then one way to state the above fact is that if F is the set of students studying 
French and 5 is the set of students studying Spanish, then 

|FUS| = |F| + |S|. 

We make the above counting principle explicit. 


Addition Principle. If A and B are disjoint finite subsets of a set X, then 

\AUB\ = \A\ + \B\. 


The Addition Principle extends, by induction, to any number of finite sets. 
Lemma 5.41. If (A i )”=i i s a f am Hy of pairwise disjoint finite sets, then 


|l>| = 

i=i 

Proof. The proof is by induction on n 
Principle. For the inductive step. 


£im 

/ =1 

> 2. The base step is the Addition 


Now (U"=i A i) n A n 


n /n — 1 \ 

IM = LM 


i = 1 \z =1 / 

= 0: Exercise 5.60 on page 226 gives 



n An = (A 1 n A n ) u • • • u (A„_! n A„), 


and each Aj IT A n = 0 because the subsets are pairwise disjoint. ■ 
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Let’s return to your class of students, 15 of whom are studying French and 
12 of whom are studying Spanish. What if 4 of them are studying both French 
and Spanish? You’d then have fewer than 27 students in the class because of 
double counting. A Venn diagram can help you figure out how to calculate 
the actual number. The goal of this subsection is to develop a general method 
of calculating the number of elements in the union of a finite collection of 
possibly overlapping finite sets. 

As a Venn diagram illustrates, the Addition Principle no longer holds if A 
and B overlap, for elements in A Cl B are counted twice in \A\ + \B\. What is 
the formula giving a precise count of | A U B | ? The number of things that get 
counted twice must be subtracted once. 

Lemma 5.42. If A and B are finite subsets of a set X, then 
\AUB\ = \A\ + \B \ - \A n B\. 

Proof. First note that A U B is the disjoint union 

A U B = {A - B) U (B - A) U (A n B), 
so that Lemma 5.41 gives 

\AU B\ = \AC B c \ + \A C n B\ + \An B\. (5.1) 


As usual, we first compute in a Boolean ring R , after which we specialize 
to 2 X . Recall Proposition 5.36(ii): if a, b e R, then 

aw b = ab' 4- a'b + ab, 

where the summands on the right-hand side are pairwise disjoint. Hence, there 
are two more equations: factor out a to get 

aw b = a{b' + b) + a'b = a + a'b, 

or factor out b to get 

a v b = ab' + b(a' + a) = ab' + b. 

Since the summands on the right-hand side of each of the equations are pair- 
wise disjoint, we can pass back to 2 X to obtain 

|AU B\ = \A\ + | A c n B\ 

and 

\AU B\ = \B\ + \A n B c \. 

Add the equations: 

2\A U B\ = \A\ + \B\ + | A c n B\ + \An B c \. (5.2) 

Now Eq. (5.1) says that the last two terms on the right-hand side of Eq. (5.2) 
can be replaced by | A U 5 1 — \AC\ B\, giving 

2|AU B\ = \ A\ + \B\ + \ AU B\ — \ An B\. 


Subtracting | A U B \ from both sides gives the desired result. ■ 
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Example 5.43. How many positive integers < 1000 are there that are not di- 
visible by 5 or by 7? If the number of positive integers that are divisible by 5 
or 7 is D , then the answer is 999 — D. We compute D using Lemma 5.42. 

Let 

A = {n € Z : 5 | n and 0 < n < 1000} 

and 

B = {n € Z : 7 | n and 0 < n < 1000}. 

The Division Algorithm gives \A \ = 199, because 999 = 199- 5 + 4; similarly, 
| B | = 142 and \A IT B\ = 28, where A IT B = {n e 1 : 35 | n and 0 < n < 
1000}. Hence, 

\AUB\= \A\ + \B\ -\AHB\ 

= 199+ 142-28 = 313. 

Therefore, there are exactly 999 — 313 = 686 positive numbers < 1000 that 
are not divisible by 5 or by 7. k 


How to Think About It. You could probably convince yourself of the result 
in Lemma 5.42 with a Venn diagram accompanied by a few examples. While 
diagrams and examples can motivate insight, they are not substitutes for rig- 
orous proof. The reason is that a picture can be misleading. For example, if 
you aren’t careful about drawing a Venn diagram for the union of four or more 
regions, then some possible intersections might be overlooked. 


Example 5.44. Let’s look at the case of three finite subsets A, B, and C of a 
set X. Before reading on, what do you think the formula should be? The basic 
idea is to apply Lemma 5.42 twice. 

|+U5UC| = |(HU£)UC| 

= \AU B\ + \C\-\(AU B)HC\ 

= \ A\ + \B\-\An B\ + \C\-\(AnC)U(B nC)\ 

= \A\ + \B\ + \C\-\AnB\ 

-(|HnC| + |BnC|-|HnfinC|) 

= \ A\ + \B\ + \C\- (\An B\ + \AnC\ + \B nc\) 

+ \A n B n C\. 

So, the number of elements in the union of three sets is the sum of the number 
of elements in each, minus the sum of the number of elements in the pairwise 
intersections, plus the number of elements that are common to all three. ▲ 

We want to generalize the formula in Example 5.44 to count the number of 
elements in a union of finitely many subsets. The difficulty in deriving such 
a formula by a brutal assault is that we must be careful that an element in 
the union is not counted several times, for it may occur in the intersection of 
several of the +;. To illustrate, consider Figures 5.3 and 5.4, Venn diagrams 
depicting the various intersections obtained from three subsets and from four 
subsets. 
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To count the number of elements in a union 

A x U •■•U A r , 

we proceed by induction, using the idea of Example 5.44 by shearing off A r 
and treating the union of the rest as one set. The details are technical, so 
sharpen a pencil and follow along. 

Given finite subsets A \ , . . . , A r of a set X, let us write 

Ajj = Aj fl Aj , where 1 < i < j < r, 

Aijk = Ai fl Aj fl Ak, where 1 < i < j < k < r, 


Aixiz-iq = ndijn-'-ndi,, where 1 < i\ < h < ■ ■ ■ iq < r, 

^4i2— r = 'll n n A r . 

Theorem 5.45 (Inclusion-Exclusion). Given finite subsets Ai A r of a 

set X, we have 

J2 \Aij\+ \Aijk\---+(-\y- l \Ax 2 -r\. 

i<r i < j <r i < j <k<r 

Proof. The proof is by induction on r > 2. The base step is Lemma 5.42. 

For the inductive step, the same Lemma gives 

\Ai\J---GA r -i)GA r \ = | ^4 1 U - ■■UA r -i \ + |y4 r | — |(^i U • ■■UA r -{) C\A r \. 

Now 

|(^i u u A r - 1 ) n A r | = |(4i nA r )u-u (A r -i n A r )\ 

= \A\r U • • • U A r -\ r |, 

and the inductive hypothesis gives 

Mi U ••• U A r -i\ = Y, M/|— Y, Mul + f (-l) r 2 Mi2— (/■— l) I 

i<r— 1 i<j<r— 1 

as well as 

Mir u • • • U A r -1 r | = Y \Aj r \ Y, Myrl + ■ ■ ■ + (— l) r 2 Ml2-r|- 

z <r i < j <r 

Finally, collect terms, realizing that — (— l) r ~ 2 = (— l) r_1 . ■ 
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In Exercise 5.76 below, you will use Inclusion-Exclusion to give a formula 
computing the Euler ^-function 4>{n). 

Here is an interesting special case of Theorem 5.45 that applies when all 
intersections T,-, IT- • -IT Aj g , for each q < r, have the same number of elements. 

Corollary 5.46 (Uniform Inclusion-Exclusion). If A \ , . . . , A r are finite sub- 
sets of a set X such that , for each q < r, there is an integer s q with 

| A i ] n ••• n A jy | = Sq, 

then 

\A\ U •••U A r \ = rsi - Uj,S 2 + V 3 b (-l) r_1 jy. 


Proof. By hypothesis, \A{\ = for all i, and so \A{\ = rs i. How many 
terms are there in the sum ^ 1<I - 1< <i <r |? If q = 2, there is one term 

\Ajj\ = \Ai fl Aj\ for each pair of distinct A,-, A j in {Hi, . . . , A r }; that is, 
there’s one term for each choice of 2 of the r subsets. If q = 3, there is one term 
I A ij k I = | Aj IT Aj fl Ak | for each triple of distinct At , Aj , A^ in {A i , . . . , A r }; 
that is, there’s one term for each choice of 3 of the r subsets. In general, there 
are r choose q terms in the sum ^i< (1 <( ' 2 <...<, <r | d (] , 2 ..., 4 1 ; thus, there are 
Q terms of the form \Ai li2 ... iq (.Therefore, the sum J2i<i 1 <...<i q <r \ A h-ig\ in 
the Inclusion-Exclusion formula is here equal to {’ q )s q . ■ 


Example 5.47. Social Security numbers are 9-digit numbers of the form xxx- 
xx-xxxx (there are some constraints on the digits, but let’s not worry about 
them here). How many Social Security numbers are there that contain all the 
odd digits? 

As usual, it is easier to compute the size of the complement of a union. Let 
X be the set of all 9-digit numbers and, for i = 1, 3, 5, 7, 9, let 

Rj = {n € X : i is not a digit in n}. 


Thus, R\ U R 3 U R 5 U R-/ U Rg consists of all 9-digit numbers missing at least 
one odd digit. There are 10 9 Social Security numbers. For each i, we have 
| Rj | = 9 9 (for i does not occur). If i < j , then | R, fl Rj \ = 8 9 (for i and 
j do not occur); if i < j < k, then | Rj C\ Rj C\ Rk\ = 7 9 , and so forth. By 
Corollary 5.46, 


|tfi U R 3 U R 5 U R 7 U R g \ = 5 ■ 9 9 




Therefore, the answer is 10 9 — \Ri U R 3 U R 5 U Rj U 1 . A 


Exercises 


You can compute this 
number explicitly if you 
really care to know it. 


5.72 There is a class of students, all of whom are taking French or Spanish. If 15 
students are studying French, 12 are studying Spanish, and 4 are studying both, 
how many students are in the class? Notice that “or” is not “exclusive or.” 


Answer. 23. 
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5.73 There is a class of students, all of whom are taking either French, German, or 
Spanish. Suppose that 15 students are studying French, 12 students are studying 
German, and 10 students are studying Spanish; moreover, 4 students are study- 
ing French and German, 5 are studying German and Spanish, and 3 are studying 
French and Spanish. One brave soul is studying all three at once. Flow many stu- 
dents are in the class? 

Answer. 26. 

5.74 Is “Inclusion-Exclusion” an appropriate name for Theorem 5.45? Why? 

5.75 Elvis is playing a game in which he tosses a fair coin and rolls a fair die. He wins 
if either the coin comes up heads or the die rolls a multiple of 3. What is the 
probability that Elvis wins the game? 

5.76 * Recall that if p is a prime and 0 is the Euler-0 function, then <j>(p) = p — 1 
(see page 111). 

(i) Suppose?; = P\ P 2 P 3 is a product of three prime powers. Show that 

n n n n n n n 

0(n) ~ n H + + . 

Pi P 2 P 3 P 1 P 2 PlP3 P2P3 P1P2P3 

(ii) Generalize to show that if n = 1 p . . . p„ n , where p \ , . . . , p n are distinct 

primes, then 


<p(n) = n 



+ E 


i 

mpj 


E 

‘JJ 


1 

PiPjPl 


+ ••• + (~l) fc 



(iii) Using the notation of part (ii), show that 
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The two most important rings appearing in precollege mathematics are Z and 
k[x\ (where k is usually Q, R, or C). The goal of this chapter is to show that 
these rings share some basic structural properties: both are domains, each has a 
division algorithm, and non-units in each are products, in essentially only one 
way, of irreducibles (primes in Z, polynomials in k\x\ having no nontrivial 
factorizations); there are numerous other parallels as well. Our program is to 
take familiar results about Z and investigate their analogs in k [x] . Sometimes a 
translation from Z to k [x] is quite simple — not only is the analog of a theorem 
in Chapter 1 true, but so is its proof, mutatis mutandis; in other cases, however, 
some modifications in proofs are necessary. 

6.1 Parallels to Z 
Divisibility 

Let’s begin with a discussion of divisibility. 

Definition. If R is a commutative ring and a,b e R. then a is a divisor of b, 
denoted by 

a | b , 

if there is r e R with b = ar . We continue using the usual synonyms: a 
divides b or b is a multiple of a. 

The next result, analogous to Lemma 1.13, will be very useful in what fol- 
lows. It allows us to use degree in k[x\ as a proxy for absolute value in Z. 

Lemma 6.1. Let k be afield and let f(x), g(x) € k[x\. If f 0 and f \ g, 
then 

deg (/) < deg(g). 

Proof. If g = fq, where q(x ) G k\x\, then Lemma 5.8(ii) gives deg(g) = 
deg (fq) = deg(f)+deg(q). Since deg (q) > 0, we have deg (/) < deg(g). ■ 

Recall that a unit in a commutative ring R is an element that has a multiplic- 
itive inverse in R. The only units in Z are ±1, but the polynomial ring k[x], 
where k is a field, has many units, as the next proposition shows. 

Proposition 6.2. If k is a field, then u(x) € k[x] is a unit if and only ifu is a 
nonzero polynomial of degree 0; that is, u is a nonzero constant. 


Sometimes, we’ll denote a 
polynomial by /(x); other 
times, we’ll simply write 
/. Both conventions are 
commonly used in algebra. 


233 



234 Chapter 6 Arithmetic of Polynomials 


Proof. If u is a unit, then there is a polynomial v(x) e k [x] with u v = 1 . Thus, 
u | 1 and, by Lemma 6.1, we have deg(w) < deg(l) = 0. Hence, deg(w) = 0. 

Conversely, if deg(w) = 0, then u e k. Since k is a field and u f 0, there 
is an inverse u~ x in k\ that is, u is a unit in k . A fortiori, u is a unit in k [x] . ■ 

Describing the units in k\x\ when k is not a field is much more complicated. 
For example, a nonzero constant need not be a unit: 5 is not a unit in Z [x] . And 
a unit need not be a constant: (2x + l) 2 = 4x 2 + 4x + 1 = 1 in Z^x], so that 
2x + 1 is a unit in Z 4 [x] (it is its own inverse). 

Multiplying an element of a commutative ring by a unit doesn’t change any 
of its essential algebraic properties. It’s convenient to give a name to elements 
that are so related. 

Definition. An associate of an element a in a commutative ring R is an ele- 
ment of the form ua for some unit u e R. 


Example 6.3. (i) Since the only units in Z are ± 1, the associates of an inte- 

ger m are ±m. 

(ii) There are only four units in the Gaussian integers Z [/], by Proposition 
4.42: namely ±1 and ±L Hence, every nonzero Gaussian integer z has 
four associates: z, — z, iz, —iz. 

(iii) There are exactly six units in the Eisenstein integers Z [co\, where co = 
k ^ - I + i V5), by Exercise 4.45 on page 165. Hence, every Eisenstein 
integer z has exactly six associates: ±z, ±coz, ±&> 2 z. 

(iv) If k is a field. Proposition 6.2 says that the associates of / (x) e k[x\ are 
nonzero multiples uf for u G k. k 

Proposition 6.4. Ifk is afield, every nonzero polynomial in k[x\ has a monic 
associate. 

Proof. If the leading coefficient of / is c, then c, being a nonzero element of 
k, is a unit, and so / is associate to c -1 /. ■ 

In a commutative ring R, every element a € R is divisible by units u (for 
a = u(u^ 1 a)) and associates ua [fora = u~ 1 (ua)]. An element having only 
these obvious divisors is called irreducible. 


The definition of prime on 
page 22 says that primes 
are positive. 


Definition. An element a in a commutative ring R is irreducible in R if it is 
neither zero nor a unit and its only divisors are units and associates. 

An integer n is irreducible in Z if and only if n = ± p for some prime p; 
that is, n is an associate of a prime. When k is a field. Proposition 6.2 implies 
that every associate uf of a polynomial / (x) € k [x] has the same degree as /, 
and it is easy to see that if / is irreducible, then uf is also irreducible. 


How to Think About It. We have defined irreducible in R , not irreducible, 
for irreducibility depends on the ambient ring. In particular, irreducibility of a 
polynomial in k[x] depends on the coefficient ring k, hence on R = k[x]. For 
example, the polynomial x 2 + 1, when viewed as lying in M[x], is irreducible. 
On the other hand, when x 2 + 1 is viewed as lying in the larger ring C[x], 
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it is not irreducible, for x 2 + 1 = {x + i)(x — i) and neither factor is a unit. 
Similarly, a prime p may factor in some larger commutative ring containing Z. 
For example, in the Gaussian integers Z[z], the prime 5 in Z factors: 5 = 
(2+ 0(2 — /). Since the only units in Z [/ ] are ± 1 and ±/ , by Example 6.3(iii), 
the factors are neither units nor associates of 5 in Z[z], 


In general, testing a polynomial for irreducibility is hard. Here is a crite- 
rion for irreducibility of polynomials over fields that uses degree to narrow the 
kinds of polynomials that need to be tested as factors. 

Proposition 6.5. Letk be a field and let fix) e k[x\ be a nonconstant polyno- 
mial. Then f is irreducible in k[x] if and only if it has no factorization f = gh 
in k[x\ with both factors having degree < deg (/). 

Proof. If / is irreducible in k[x\ and / = gh, then one factor, say g, is a 
unit (why?). By Proposition 6.2, we have deg(g) = 0 < deg(/), for / is 
nonconstant. 

Conversely, if / = gh and / is not a product of polynomials of smaller 
degree, then one factor, say g, must have degree 0, hence it is a unit. Therefore 
/ is irreducible in k[x\. ■ 

Every linear polynomial a(x) = rx + s e k[x\, where k is a field, is ir- 
reducible in k[x\. if a = fg, then 1 = deg(a) = deg (/) + deg(g). Hence, 
deg(/). deg(g) e {0, 1}. It follows that one degree is 0 and the other is 1, 
and so a is irreducible, by Proposition 6.5. There are fields k whose only ir- 
reducible polynomials are linear; for example, the Fundamental Theorem of 
Algebra says that C is such a field. 

Proposition 6.5 need not be true if the ring of coefficients is not a field. 
Indeed, linear polynomials need not be irreducible. For example, 5x + 5 = 
5(x + 1) is not irreducible in Z[x], even though one factor has degree 0 and 
the other degree 1, for 5 is not a unit in Z[x]. 

Proposition 6.6. Let R be a domain and let a, b e R. 

(i) a \ b and b \ a if and only if a and b are associates. 

(ii) Let k be a field and a,b e R = k[x\ be monic polynomials. If a \ b and 
b | a , then a = b. 

Proof, (i) If a \ b and b \ a, there are r, s e R with b = ra and a = sb, and 
so b = ra = rsb. If b = 0, then a = 0 (because b \ a); if b ^ 0, then 
we may cancel it {R is a domain) to obtain 1 = r.v. Hence, r and ,v are 
units, and a and b are associates. The converse is obvious (and it does not 
need the hypothesis that R be a domain). 

(ii) Corollary 5.9 tells us that R is a domain, so, by part (i), there is a unit 
u e k[x\ with a = ub. Now u is a nonzero constant, by Proposition 6.2. 
Because a \ b and b \ a, a and b have the same degree (by Lemma 6.1), 
say m. Since they are monic, the leading coefficient of ub is u and the 
leading coefficient of a is 1. Hence u = 1 and a = b. ■ 

The next example shows that we need the hypothesis in Proposition 6.6 that 
R be a domain. 


And the other factor is 
h = g~ l f, an associate 
of/. 


See Corollary 6.15. 
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Example 6.7 (Kaplansky). Let X be the interval [0, 3]. We claim that there 
are elements a, b G C(X) (see Example 4.31(h)) each of which divides the 
other yet which are not associates. Define 

a(t) = 1 — t = b(t) for all t e [0, 1] 
aft) = 0 = b(t) for all t e [1, 2] 
a(t) = t — 2 for all t e [2,3] 
b(t) = —t + 2 for all t G [2, 3]. 

If n e C(X ) satisfies v(t) = 1 for all t g [0, 1] and v(t) = —1 for all t G [2. 3], 
then it is easy to see that b = a v and a = bv (same v); hence, a and b divide 
each other. 

Suppose a and b are associates: there is a unit u G C(X) with b = an. As 
for v above, u(t) = 1 for all t G [0, 1] and u(t) = — 1 for all t G [2, 3]; in 
particular, w(l) = 1 and u( 2) = —1. Since u is continuous, the Intermediate 
Value Theorem of calculus says that u(t) = 0 for some t G [1 , 2], But this 
contradicts Exercise 4.41(h) on page 164, which says that units in C(X) are 
never 0. ▲ 


The next result shows that irreducible polynomials over a field behave like 
primes in Z; they are “building blocks” in the sense that every nonconstant 
polynomial can be expressed in terms of them. 

Proposition 6.8. If k is afield, then every nonconstant polynomial in k[x\ is 
a product of irreducibles. 


We continue to use the 
term product as we have in 
earlier chapters: a product 
can have only one factor. 
Thus, it’s okay to say that 
a single irreducible is a 
product of irreducibles. 


Proof. If the proposition is false, then the set 

C = {a(x) G k [a] : a is neither a constant nor a product of irreducibles} 

is nonempty. Let h(x ) G C have least degree (the Least Integer Axiom guaran- 
tees h exists). Since h G C, it is not a unit, and so 0 < deg(/t); since h is not ir- 
reducible, h = fg, where neither / nor g is a unit, and so, by Proposition 6.2, 
neither / nor g is constant. Hence, Lemma 6.1 gives 0 < deg (/) < deg(/;) 
and 0 < deg(g) < deg(/t). It follows that / f C and g C , for their degrees 
are too small (/; has the smallest degree of polynomials in C). Thus, both / 
and g are products of irreducibles and, hence, h = fg is a product of irre- 
ducibles, contradicting h G C . Therefore, C is empty, and the proposition is 
true. ■ 


Corollary 6.9. Ifk is a field, then every nonconstant f (x) G k[x\ has a fac- 
torization 

f{x) = api(x)--- p n {x), 

where a is a nonzero constant and the pt are monic irreducibles. 

Proof. Apply the result of Exercise 6.8 on page 243 to a factorization of / as 
in the proposition. ■ 

We continue showing that polynomials over fields behave very much like 
integers. Let’s first do some long division. 
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4x 3 —14x 2 

x 2 + 3x — 2^)4x 5 — 2x 4 + x 3 --- 
4x 5 +12x 4 -8x 3 
— 1 4.v 4 + 9.v 3 


This process can be completed until we get 0 or a remainder of degree < 2 
(which is it?). Generalizing, there is a Division Algorithm for f?[x], where R 
is any commutative ring: if a(x), b(x ) € /([x] and a is monic, then there are 
q{x ), r{x) G R [x] with b = qa + r, where r = 0 or deg(r) < deg(a). The 
basic idea is to mimic what we’ve just done. 

Proposition 6.10. Let R be a commutative ring and f(x), g(x) G /?[x]. If f 
is monic, then there exist q{x ), r(x) G f?[x] with 

g =qf + r, 

where r = 0 or deg(r) < deg (/). 


Proof. Let 

f = x n + a„- ix"^ 1 4 1- a 0 and g = b m x m + b m -ix m ~ l 4 1 - b 0 . 

If m = deg(g) < deg (/) = n, then take q = 0 and r = g. 

If m > 77 , the quotient begins with b m x m ~ n multiplied by / ; now subtract, 
getting a polynomial of degree less than m . The rest of the proof is by induction 
on m = deg(g) > n. If 


G(x) = g — b m x m ~ n f 

then either G = 0 or deg(G) < m = deg(g). If G = 0, we are done: set 
q = b m x m ~ n and r = 0. If G 0, the inductive hypothesis gives polynomials 
q' and r with G = q' f 4 -r, where either r = 0 ordeg(r) < deg (/). Therefore, 
g — b m x m ~ n f = q' f 4 - r, and so 

g = (b m x m ~ n +q')f + r. m 

When R is a field, we can divide by every nonzero polynomial, not merely 
by monic ones; moreover, the quotient and remainder are unique. 


In Z, if b < a, then 
b = Oa + b\ for example, 
27 = 0 - 35 + 27. Similarly 
for polynomials: x 2 + 1 = 
0(x 3 + x 2 -l) + (x 2 +l). 


Theorem 6.11 (Division Algorithm). Let k be afield and f (x), g(x) G k[x\. 
If f ^ 0, then there exist unique q(x ), r(x) G k[x\ with 

g =qf + r, 

where r = 0 07'deg(r) < deg (/). 


Proof We first prove the existence of q and r. Now / = a n x n + ■ ■ ■ + cio, 
where a n f 0. Since k is a field, it contains the inverse a~ x . Hence, a~ l f is 
monic, and Proposition 6.10 gives q' (x). r(x) G k[x\ with 

8 = q'(c,- l f) + r , 
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where either r = 0 or deg(r ) < deg (a n 1 /) = deg (/). Therefore, 

g = qf + a 


where q = q'a^ 1 . 

To prove uniqueness of q and r, assume that g = Qf + R, where R = 0 
or deg (R) < deg (/). Then qf + r = g = Qf + R, and 

(< q -Q)f = R-r. 

If R f r, then each side, being nonzero, has a degree. Since k is a field, k[x] 
is a domain (Lemma 5.8), and so 

deg ({q - Q)f ) = deg (q - Q) + deg (/) 

> deg (/), 

while deg(R — r) < max{deg(f?), deg(r)} < deg (/), a contradiction. Hence, 
R = r and (q — Q)f = 0. As / f 0, it can be canceled: thus, q — Q = 0 
and q = Q. ■ 

By Exercise 6.5 on page 243, Theorem 6.1 1 remains true if we weaken the 
hypothesis so that k is only a domain. 

There is a two-step strategy to determine whether one integer divides an- 
other: first, use the Division Algorithm; then show that the remainder is zero. 
This same strategy can now be used for polynomials. 

Example 6.12. This example shows that quotients and remainders may not 
be unique when the coefficients do not lie in a domain. In Z^x], let b(x) = 
2x 3 + 3 and a(x ) = lx 2 + 2x + 1. Then 

2.x 3 = (x T l)(2x T 2x — t— 1 ) — t— (x -f- 2) 

= (x + 3)(2x 2 + 2x + 1) + x. 

The quotient and remainder in the first equation are x + 1 and x + 2, while the 
quotient and remainder in the second equation are x + 3 and x. Note that both 
x + 2 and x are linear, and hence 

deg(x + 2) = deg(x) 

= 1 

< deg (a) 

= 2 . ▲ 

In forthcoming investigations into roots of unity, we’ll need to know whether 
x m — 1 divides x" — 1. Certainly this is true when m \ n because, if n = mq, 

x" - 1 = x mq - 1 
= (x m ) q - 1 

= (x m - 1) ((x m ) 9_1 + (x m f~ 2 + (x m ) 2 + x m + l) . 

The converse is also true, and the proof uses the Division Algorithms in Z and 
in k [x] . 
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Proposition 6.13. Ifk is afield , then x m — 1 divides x n — 1 in k [x] if and only 
ifm | n. 

Proof. We’ve seen above that x m — 1 divides x n — 1 if m \ n. 

Conversely, suppose that x m — 1 divides x n — 1. If n = mq + r where 
0 < r < m, then 

x n - 1 = x mq+r - 1 

= x mq+r - x r + x r - 1 
= ( x mq - 1) + (x r - 1) . 

We’re assuming that x m — 1 divides x" — 1 and, as in the discussion just 
preceding this proposition, x m — 1 divides x mq — 1 . Hence, by the 2 out of 
3 property for polynomials (Exercise 6.7 on page 243), x m — 1 divides x r — 1. 
Since r < m, we must have r = 0 (why?). ■ 

Roots 

We are going to apply the preceding results to roots of polynomials. We’ve 
been using the word “root” all along; let’s begin with a formal definition. 

Definition. If fix) e k[x], where A' is a field, then a root of f in k is an 
element a € k with f (a) = 0. 


How to Think About It. We have just defined “root in k,” not “root.” Often, 
a root of a polynomial f(x)& k[x] may live in a larger field K containing k, 
but we still call it a root of /. For example, f(x) = x 2 — 2 has its coefficients 
in Q, but we usually say that V2 is a root of / even though V2 is irrational; 
that is, \/2 <£ Q. 


Etymology. Why is a root so called? Just as the Greeks called the bottom 
side of a triangle its base (as in the area formula ^ altitude x base), they also 
called the bottom side of a square its base. A natural question for the Greeks 
was: given a square of area A, what is the length of its side? Of course, the 
answer is vX. Were we inventing a word for vX, we might have called it the 
base of A or the side of A. Similarly, consider the analogous three-dimensional 
question: given a cube of volume V, what is the length of its edge? The answer 
s/V might be called the cube base of V , and vX might then be called the 
square base of A. Why, then, do we call these numbers cube root and square 
rootl What has any of this to do with plants? 

Since tracing the etymology of words is not a simple matter, we only sug- 
gest the following explanation. Through 400 CE, most mathematics was written 
in Greek, but, by the fifth century, India had become a center of mathematics, 
and important mathematical texts were also written in Sanskrit. The Sanskrit 
term for square root is pada. Both Sanskrit and Greek are Indo-European lan- 
guages, and the Sanskrit word pada is a cognate of the Greek word podos; 
both mean base in the sense of the foot of a pillar or, as above, the bottom of a 
square. In both languages, however, there is a secondary meaning “the root of a 
plant.” In translating from Sanskrit, Arab mathematicians chose the secondary 
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This title can be translated 
from Arabic, but the 
words already had a 
technical meaning: both 
jabr and muqabala refer 
to operations akin to 
subtracting the same 
number from both sides of 
an equation. 


meaning, perhaps in error (Arabic is not an Indo-European language), perhaps 
for some unknown reason. For example, the influential book by al-Khwarizmi, 
Al-jabr w’al muqabala, which appeared in the year 830 CE, used the Arabic 
word jidhr, meaning root of a plant. (The word “algebra” is a European ver- 
sion of the first word in the title of this book; the author's name has also come 
into the English language as the word “algorithm.”) This mistranslation has 
since been handed down through the centuries; the term jidhr became standard 
in Arabic mathematical writings, and European translations from Arabic into 
Latin used the word radix (meaning root, as in radish or radical ). The notation 
r 2 for \fl occurs in European writings from about the twelfth century (but the 
square root symbol did not arise from the letter r; it evolved from an old dot 
notation). However, there was a competing notation in use at the same time, 
for some scholars who translated directly from the Greek denoted <J2 by / 2, 
where / abbreviates the Latin word lotus, meaning “side.” Finally, with the in- 
vention of logarithms in the 1500s, r won out over /, for the notation 12 was 
then commonly used to denote log 2. The passage from square root to cube 
root to the root of a polynomial equation other than x 2 — a and x 3 — a is a 
natural enough generalization. Thus, as pleasant as it would be, there seems to 
be no botanical connection with roots of equations. 


Proposition 6.14 (Remainder Theorem). Let fix) G k[x\, where k is a field. 
Ifu G k, then there is q(x ) G k[x\ with 

fix) = q(x)(x - u) + f(u). 

Proof. The Division Algorithm gives 

fix) = q(x)(x — u) + r, 

where either r = 0 or deg(r) < deg(x — u) = 1; hence, the remainder r 
is a constant. By Corollary 5.21, evaluation at u is a homomorphism; hence, 
fiu) = q(u)iu — u) + r, and so fiu) = r. ■ 

Proposition 6. 14 is often paraphrased to say that fiu) is the remainder after 
dividing / (x) by x — u. 

Here is a connection between roots and factoring. 

Corollary 6.15 (Factor Theorem). Let fix) e k[x], where k is a field, and 
let a e k. Then a is a root of f in k if and only if x — a divides f . 

Proof. If a is a root of / in k, then /(a) = 0, and Proposition 6.14 gives 
fix) = q(x)(x — a). Conversely, if f(x) = g(x)(x — a), then evaluating at 
a gives / (a) = gia)(a — a) = 0; that is, a is a root of f ink. ■ 

The next result turns out to be very important. 

Theorem 6.16. Let k be a field. If fix) e k[x] has degree n, then f has at 
most n roots in k. 

Proof. We prove the statement by induction on n > 0. If n = 0, then / is a 
nonzero constant, and the number of its roots in k is zero. Now let n > 0. If/ 




6.1 Parallels to Z 241 


has no roots in k, we are done, for 0 < n. Otherwise, we may assume that / 
has a root a in k. By Corollary 6.15, 

fix) = q(x)(x - a ); 

moreover, q(x) e k\x\ has degree n — 1. If there is another root of / in k, say 
b, where b ^ a, then evaluating at b gives 

0 = fib) = q{b)(b — a). 

Since b — a ^ 0, we have q(b) = 0 (for k is a field, hence a domain); that is, 
b is a root of q. But degiq) = 77 — 1, so that the inductive hypothesis says that 
q has at most 77 — 1 roots in k. Therefore, / has at most n roots in k , namely 
a and the roots of q. ■ 

Example 6.17. Theorem 6.16 is not true for polynomials with coefficients in 
an arbitrary commutative ring. For example, the quadratic polynomial 
x 2 — 1 in Zg [x] has four roots in Zs, namely 1,3,5, and 7. On the other hand, 
Exercise 6.14 on page 247 says that Theorem 6.16 remains true if we assume 
that the coefficient ring is only a domain. ▲ 

Recall that every polynomial f (x) e k[x\ determines the polynomial func- 
tion / # e Poly(/r), where f # :k — > k is defined by a m- f(a) for all a e k. 
On page 204, however, we saw that the nonzero polynomial f(x) = x p — x € 
IFp [a ] determines the constant function zero; different polynomials can deter- 
mine the same polynomial function. This pathology vanishes when the field k 
is infinite. 

Proposition 6.18. Let k be an infinite field and fix), g(x) e k\x\. If f and 
g determine the same polynomial function ( that is, / # = g # , so that f(a) = 
g(a) for all a e k), then f = g. 

Proof If / 7 ^ g, then the polynomial h = f —g, being nonzero, has a degree, 
say 77 . But every element of k is a root of /;; since k is infinite, h has more than 
77 roots, and this contradicts Theorem 6.16. ■ 

This proof yields a more general result. 

Corollary 6.19. Let k be a ( possibly finite) field, and let fix), gix) e k[x], 
where deg (/) < deg(g) = n. If /(a) = gia) for n + 1 elements a € k, then 

f = 8- 

Proof. If / g, then deg (/ — g) is defined; but deg (/ — g) < n, and so 
f — g has too many roots. ■ 

We can now show that k[x\ and Polyf/c) are structurally the same for the 
most familiar fields k. 

Theorem 6.20. If k is an infinite field, then 


k\x] = Poly(/c). 
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Proof. In Example 5. 14(ii), we saw that <p: k[x\ — > Poly(/c), sending / i->- / # , 
is a surjective homomorphism. Since k is infinite. Proposition 6.18 applies to 
show that cp is injective. Therefore, tp is an isomorphism. ■ 

We now generalize Proposition 6.18 to polynomials in several variables. 
Denote the n -tuple (xi , . . . , x„) by X. 

Proposition 6.21. Let k be an infinite field. 

(i) If f ( X ) G k [X] = k[x\ , . . . , x n \ is nonzero, then there are a\, ... ,a n G 
k with f(a\,..., a„) ^ 0. 

(ii) Iff(X),g(X)€k[X\ and 

/(at, . . . ,a„) = g(a i, ...,a n ) for all (at, . . ,,a„) G k n , 
then f = g. 


Since f(a,x „ ) and 
g{ct,x n ) lie in k[x n \, we 
have f(a,b),g(a,b)£k. 


Proof, (i) The proof is by induction on 77 > 1. If n = 1, then the result is 
Proposition 6.18, because / (a) = 0 for all a e k implies / = 0. For the 
inductive step, assume that 

f(x i, .... x n ) = B 0 + B\x n + B 2 xl H b B r x r n , 

where all Bj e k[x i, . . . , x„_i] and B r f 0. By induction, there is a = 
(at, ... , a„_t) e k n ~ l with B r (a ) ^ 0. Hence, / (a, x„) G Ar[x„], and 

f{a,x„) = B 0 (a) + B\(a)x n -\ b B r {a)x r n ± 0. 

By the base step, there is a n G k with f(a,a n ) ^ 0. 

(ii) The proof is by induction on n > 1; the base step is Proposition6.18. For 
the inductive step, write 

f(X , y) = Pi (X)y' and g(X, y) = ^ q, (X)y ' , 

i i 

where X denotes (xi, . . . , x„_i) (by allowing some p ’ s and q ' s to be 
zero.we may assume that both sums involve the same indices i). Suppose 
that f(a, b) = g(a, b ) for every a G k and every b G k. For fixed 
a g k n ~ x , define F a (y) = Jfi Pi(pi)y l and G a (y ) = Jfi 9i(pi)y l ■ Since 
both F a (y) and G a (y ) are in k[y], the base step gives p/(a) = qj (a) for 
all a G k n . By the inductive hypothesis, p, ( X ) = q/(X) for all /, and 
hence 

f(X, y) = J2 Pi (*)/ = E C H W = S( x ' yy ■ 

i i 


Exercises 

6.1 Prove that the only units in Z[.v] are ±1, and that the only associates of a polyno- 
mial f(x) G Z[.v] are ± /. 

6.2 * Let R be a domain, and let p(x), q(x) e U[x]. 

(i) If p and q are irreducible, prove that p \ q if and only if there is a unit w with 
q = up. 

(ii) If, in addition, both p and q are monic, prove that p \ q implies u = 1 and 
p=q. 
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6.3 (i) If R is a domain, prove that the only units in /■? [.v ] are units in R. 

(ii) The domain Z 2 has only one unit. Give an example of an infinite domain 
having only one unit. 

6.4 Let R be a commutative ring and let a(x), b(x) £ A 1 [a], where a f 0. Prove that 
Proposition 6.10 generalizes: if the leading coefficient of a is a unit, then there 
exist q(x), r(x) £ R[x\ with ft = qa + r, where either r = Oordeg(r) < deg(a). 

6.5 * Let ft be a domain and let a(x), b(x) £ ft [a], where a f 0. Prove that the 
uniqueness statement in the Division Algorithm generalizes: if there are q,r, Q.R 
in k[x] with qa + r = ft = Qa + R, where r = 0 or deg(r) < deg(a), and where 
R = 0 or deg(f^) < deg(a), then R = r and Q = q. 

6.6 Let k be a domain and let f(x) e ft [a]. If a(x) is an associate of /, prove that 
deg(/) = deg(fl). Give an example to show that the statement may be false if k 
is not a domain. 

6.7 * Show that there is a “2 out of 3” result for polynomials, analogous to the one for 
integers: if k is a field and f,g,h 6 ft [a] are polynomials such that / = g + ft, 
then a polynomial that divides two of the three will divide the third. 

6.8 * Let I? be a domain and f(x) £ /?[.v] be nonzero. If / = g\ ■■■ g n , where 
gi (x) £ .R[.y] for all i , show that there exist a nonzero a £ R and monic gi (a) £ 
R[x] with f = ag\---g' n . 

6.9 (i) Let f(x),g(x) £ Q[.v] with / monic. Write a pseudocode (or a program 

in a CAS) implementing the Division Algorithm with input /, g and output 
qix), /'(a), the quotient and remainder. 

(ii) Find the quotient and remainder by dividing .r 3 + 2x 2 — 8.r + 6 by a — 1 as 
you would in high school. 

6.10 * If R is a commutative ring, define a relation = on R by a = ft if they are 
associates. Prove that = is an equivalence relation on R. 

6.11 A student claims that jc — 1 is not irreducible in Q[a] because there is a factoriza- 
tion x — \ = ( sfx + 1)( \fx — 1). Explain the error of his ways. 

6.12 * Prove that the ideal (x, y) in k[x, v], where ft is a field, is not a principal ideal. 

Greatest Common Divisors 

We now introduce gcd’s of polynomials fix). g(x) e A [a]. It doesn’t make 
sense to say that / < g, even when R = M, but it does make sense to say 
deg (/) < deg(g). Although some of the coming definitions make sense for 
polynomial rings ft 1 [a] over a commutative ring R. we will focus our attention 
on the rings ft [a] for fields ft. 

Definition. Let ft be a field. A common divisor of polynomials a( x), b (x ) e 
ft [a ] is a polynomial c (x) € ft [a ] with c \ a and c \ ft . If a and ft are not both 0, 
define their greatest common divisor , denoted by 

gcd(a, ft), 

to be a monic common divisor d of a and ft of largest degree. If a = 0 = ft, 
define gcd(0, 0) = 0. 

The next proposition shows that gcd’s exist; it is true, but not obvious, that 
every pair a, ft e ft [a] has a unique gcd (Corollary 6.29). 


Note the convention that 
greatest common divisors 
are monic. Weil say more 
about this in a moment. 
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Proposition 6.22. If k is a field and a (x ) , b (x ) e k [a], then a gcd of a , h 
exists. 

Proof. We saw, in Lemma 6.1, that if c and a are polynomials with c \ a, 
then deg(c) < deg(a). It follows that gcd’s exist, for common divisors do exist 
(1 is always a common divisor), and there is an upper bound on the degrees of 
common divisors, namely, max{deg(a), deg (b)}. Finally, a common divisor d 
of largest degree can be replaced by a monic associate. ■ 

Defining gcd’s of polynomials to be monic is just a normalization; after all, 
when we defined gcd’s of integers, we insisted they should be positive. This 
will be needed to prove uniqueness of gcd’s. 

Example 6.23. Here is an easy computation of a gcd, generalizing Lemma 1.17. 
Let frbea field and p(x ) e k [x] be a monic irreducible polynomial. If b(x) e 
k[x\, then 


gcd (p. b ) 


p if p\b 
1 otherwise. 


A common divisor c of p and b is, in particular, a divisor of p. But the only 
monic divisors of p are p and 1, and so gcd(/>, b) = p or 1; it is p if p \ b, 
and it is 1 otherwise. A 


We are going to see that gcd’s of polynomials are linear combinations. The 
proof of this fact for gcd’s of integers essentially involved ideals in Z, and so 
we now examine ideals in k[x]. 

In any commutative ring R, associates a and b generate the same principal 
ideal (the converse may be false if R is not a domain). 

Proposition 6.24. Let R be a domain and a,b e R. The principal ideals (a) 
and ( b ) are equal if and only if a and b are associates. 

Proof. If (a) = ( b ), then a e (/?); hence, a = rb for some r e R. and so 
b | a. Similarly, b e (a) implies a \ b, and so Proposition 6.6 shows that a and 
b are associates. 

Conversely, if a = ub, where u is a unit, then a e ( b ) and (a) C ( b ). 
Similarly, b = u~ x a implies ( b ) C (a), and so (a) = ( b ). ■ 

Ideals in general commutative rings can be quite complicated, but we have 
seen, in Theorem 5.29, that every ideal in Z is principal. When k is a field, all 
the ideals in k [a] are also principal. 

Theorem 6.25. If k is a field, then every ideal in k[x\ is a principal ideal. In 
fact, either I = (0) or there is a unique monic d{x) with I = (d) = {rd : 
r e k}. 

Proof. If I = (0), then / is a principal ideal with generator 0; that is, / = (0). 
Otherwise, let a(x) be a polynomial in / of least degree. Since a e k [x] 
is nonzero, its leading coefficient c / 0; since k is a field, c -1 exists, and 
cl = c~ 1 a is monic. By Proposition 6.24, (a) = (d). 
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Clearly, id) C /. For the reverse inclusion, let fix) G I . By the Division 
Algorithm, there are q(x ), r(x ) G k[x\ with / = qd + r, where either r = 0 
or deg(r) < deg (d). But r = f — qd e /, so that if r 0, its existence 
contradicts d being a polynomial in / of least degree. Hence, r = 0, d | /, 
and / e (d). Therefore, / C (d), and I = (d). 

To prove uniqueness, suppose that d'(x) G k[x] is a monic polynomial 
with (d') = fl). By Proposition 6.24, d' and d are associates; there is a unit 
u e k[x\ with d' = ud . Now u G k, by Proposition 6.2. Since both d' and d 
are monic, we have u = 1 and d' = d . ■ 

Recall Example 5.27(iv): the set / consisting of all polynomials f(x) G 
R[.t] having i as a root is an ideal in R[.v] containing ( x 2 + 1). We can now 
say more. 

Corollary 6.26. The ideal I C R [a] consisting of all polynomials fix) G 
R [x] having i as a root is equal to (x 2 + 1). 

Proof. Now (x 2 + 1) C I. For the reverse inclusion, we know that I = id), 
where d is the unique monic polynomial of least degree in I . But x 2 + 1 is a 
monic polynomial in /, and there can be no such polynomial of smaller degree 
lest i be a root of a linear polynomial in R[a]. ■ 

It is not true that ideals in arbitrary commutative rings are necessarily prin- 
cipal, as the next example shows. 

Example 6.27. Let R = Z[x], the commutative ring of all polynomials over 
Z. It is easy to see that the set / of all polynomials with even constant term is 
an ideal in Z[x]. We show that I is not a principal ideal. 

Suppose there is d)x) G Z[x] with I = id). The constant 2 G /, so that 
there is fix) G Z[x] with 2 = df . Since the degree of a product is the sum 
of the degrees of the factors, 0 = deg(2) = deg id) + deg (/). Since degrees 
are nonnegative, it follows that cl eg ( <r/ ) = 0; i.e., d is a nonzero constant. 
As constants here are integers, the candidates for d are ±1 and ±2. Suppose 
d = ±2; since x G I , there is gix ) G Z[x] with x = dg = ±2 g. But every 
coefficient on the right side is even, while the coefficient of x on the left side is 
1. This contradiction gives d = ±1. Thus, cl is a unit and, by Example 5.30, 
I = icl) = Z[x], another contradiction. Therefore, no such cl exists; that is, / 
is not a principal ideal. ▲ 

Recall that if R is any commutative ring and a. h G R. then a linear combi- 
nation of a , b is an element of R of the form sa + tb, where s, t G R. Given 
a, b, the set / of all linear combinations of a . b is an ideal in R. The next 
theorem parallels Theorem 1.19. 

Theorem 6.28. If k is afield and fix), g(x ) G k[x\, then any gcd of /, g 
is a linear combination of f and g‘, that is, if cl (x) is a gcd, then there are 
s(x),t(x ) g k[x\ with 


d =sf + tg. 


But see Exercise 6.22 
on page 248. There 
is h)x) e Z[x] with 
I = (2, h) 


Proof. The set / of all linear combinations of / and g is an ideal in k[x\ \ by 
Theorem 6.25, there is d ix ) G k[x] with / = id). If both / and g are 0, then 
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Recall that gcd’s are re- 
quired to be monic. That’s 
essential to uniqueness. 


cl = 0, and we are done; otherwise, we may assume that d is monic. We know 
that d = sf + tg for some s and t, because d lies in I . We claim that d is a 
gcd. Now d is a common divisor, for f. g e / = (d ). If/; is a common divisor 
of / and g, then / = f\h and g = g\h. Hence, d = sf + tg = (sf\ + tg\)h 
and h \ d . Therefore, deg(/i) < deg id), and so d is a monic common divisor 
of largest degree. ■ 

We can characterize gcd’s in k[x\. 

Corollary 6.29. Let k be afield and let fix), gix) e k[x\. A monic common 
divisor d(x) is a gcd of f,g if and only if d is divisible by every common 
divisor, that is, ifh is any common divisor of f, g, then h \ d. 

Proof The end of the proof of Theorem 6.28 shows that if h is a common 
divisor, then h \ cl. Conversely, if h \ d , then deg(/t) < deg id), and so d is a 
monic common divisor of largest degree. ■ 

Theorem 6.30. Let fix), g(x) e k[x], where k is a field, and let I = if, g) 
be the ideal of all linear combinations of f and g. 

(i) Ifdfx) e k[x\ is monic, then d = gcd (/, g) if and only if I = id). 

(ii) / and g have a unique gcd. 

Proof, (i) Suppose that d = gcd if,g). We show that id) c. I and / C 
id). Theorem 6.28 shows that d € I: therefore, id) C / (for every 
multiple of d is also a linear combination). For the reverse inclusion, let 
h = uf + vg e I . Now d | / and cl \ g, because d is a common divisor, 
and so d \ h. Hence, h = rd e id)', that is, / C (d), and so I = id). 

Conversely, suppose that / = id). Then cl = sf + tg, and so every 
common divisor h of f. g is a divisor of cl . Hence, Corollary 6.29 gives 
cl = gcd if g). 

(ii) If cl and cl' are gcd’s of / and g, then id) = id'), by part (i). Since both 
cl and cl' are monic, we must have cl = d' , by Theorem 6.25. ■ 


How to Think About It. It’s a good idea to stop and take stock of where we 
are in our program of displaying parallels between integers and polynomials. 
For polynomials over a field, we have, so far 

• extended the notion of divisibility 

• generalized “prime” to “irreducible” 

• shown that factorizations into irreducibles exist 

• established a division algorithm 

• shown that the gcd of two polynomials exists and is unique 

• shown that the gcd of two polynomials is a linear combination of them. 

Thinking back to Chapter 1, what’s next? There were two main paths we took 
then: one led to unique factorization — the Fundamental Theorem of Arith- 
metic; one led to Euclidean Algorithms. We’ll follow both these paths for poly- 
nomials. 
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Exercises 

6.13 Find the gcd of each pair (/, g ) in Q[.v] and write it as a linear combination of / 


and 

g- 



(i) 

(x 3 - 

2 

- X — X 

— 2.x 3 — 3x 2 + 3x — 2) 

(ii) 

(x 6 - 

- 1, X s — 

1) 

(iii) 

(x 3 - 

2 

-X —X 

— 2. 2x 3 — 4x 2 + 2x — 4) 

(iv) 

(x 6 - 

- 1,X 6 + 

X s - 2) 

(v) 

((2x 

+ l)(x 6 

- 1), (2x + l)(x 5 - 1)) 

(vi) 

(3x 6 

— 3, 2x s 

-2) 


6.14 * Let R be a domain. If fix) e 7? [a] has degree n, prove that / has at most n 
roots in R. 

Hint. Use Frac( R). 

6.15 If k is a field in which 1 + 1^0, prove that s/\ — x 2 is not a rational function 
over k. 

Hint. Mimic the classical proof that \fl is irrational. 

6.16 In Exercise 6.10 on page 243, we saw that the relation = on a commutative ring R, 
defined by a = b if they are associates, is an equivalence relation. Prove that if R 
is a domain, then there is a bijection from the family of all equivalence classes to 
the family of all principal ideals in R. 

6.17 * 

(i) If f(x) and g(x) are relatively prime in k[x\ (k a field) and each divides a 
polynomial /?, prove that their product fg also divides h. 

(ii) If pi , pi Pn are polynomials so that gcd(^,- , pj) = 1, and each p\ di- 

vides a polynomial h , prove that p\P 2 • ■ • Pn also divides h. 

6.18 * 

(i) Find gcd(3x 3 — 2x 2 + 3x — 2, 3x 2 + x — 2) in <C[x], 

(ii) Write a pseudocode (or a program in a CAS) implementing Euclidean Algo- 
rithm I. 

(iii) Write a pseudocode (or a program in a CAS) implementing Euclidean Algo- 
rithm n. 

Hint. Model your routine after the functions in Exercise 1.67 on page 36. 

6.19 * Prove the converse of Euclid’s Lemma. Let k be a field and let / (x) 6 k [x] be a 
nonconstant polynomial; if, whenever / divides a product of two polynomials, it 
necessarily divides one of the factors, then / is irreducible. (See Theorem 1.21.) 

6.20 (i) Find two polynomials in Q[x] whose associated polynomial functions agree 

with this input-output table: 


Input 

Output 

1 

3 

4 

17 

5 

26 


(ii) Classify the set of all polynomials that agree on the table. 

6.21 (i) Show that the set of polynomials in Q[x] that vanish on {1 , 2, 3} is an ideal in 

QM- 


(ii) What is a generator of this ideal? 
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6.22 * In Example 6.27, we saw that the ideal I in Z[x] consisting of all polynomials 
with even constant term is not a principal ideal. Find a polynomial h(x) e Z[x] 
so that / = (2, h); that is, I consists of all the linear combinations of 2 and h . 

6.23 Let k be a field and f(x),g(x) e k [x] . Generalize Exercises 5.49 and 5.50 on 
page 220: if cl(x) = gcd(/, g) and m(x) = lcm(/, g), prove that 

(/) + (g) = (d) and (/) n (g) = (m). 

6.24 Show, in Zg[x], that x 2 — 1 has two distinct factorizations into irreducibles. 

Hint. See Example 6.17. 

Unique Factorization 

The main result in this subsection is a generalization of the Fundamental Theo- 
rem of Arithmetic to polynomials: the factorization of every nonconstant poly- 
nomial over a field as a product of irreducibles is essentially unique. 

We begin by proving Euclid’s Lemma for polynomials. As for integers, it 
shows that irreducibility is a strong assumption when dealing with divisibility. 

Theorem 6.31 (Euclid’s Lemma). Let k be afield and let f (x), g{x) e k[x\. 
If p{x) is an irreducible polynomial in k[x\ and p \ fg, then 

P\f or Pig- 

More generally, if p \ f\- • • /„, then p \ f for some i. 

Proof. Assume that p \ fg but that p \ f . Since p is irreducible, gcd(/>, /) = 
1, and so 1 = sp + tf for some polynomials s(x) and t(x). Therefore, 

g = spg + tfg. 

But p | fg, by hypothesis, and so Exercise 6.7 on page 243 gives p \ g. The 
last statement follows by induction on n > 2. ■ 

The converse of Euclid’s Lemma is true; see Exercise 6.19 on page 247. 

Polynomial versions of arithmetic theorems in Chapter 1 now follow. 

Definition. Two polynomials / (x), g(x) e k[x], where k is a field, are called 
relatively prime if their gcd is 1 . 

Corollary 6.32. Let f (x), g(x), li(x) e k [x], where k is a field, and let h and 
f be relatively prime. Ifh \ fg, then h \ g. 

Proof. The proof of Theorem 6.31 works here. Since gcd(/t, /) = 1, we have 
1 = sh + tf , and so g = shg + tfg. But fg = hh\ for some h\{x) G k[x\, 
and so g = h(sg + th\). ■ 

Definition. If k is a field, then a rational function f(x)/g{x) e k(x) is in 
lowest terms if / and g are relatively prime. 

Proposition 6.33. If k is afield, every nonzero f(x)/g(x) e k (x ) can be put 
in lowest terms. 
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Proof. If / = df and g = dg\ where d = gcd(/, g), then f and g' are 
relatively prime, and so f'/g' is in lowest terms. ■ 

There is an analog of the Euclidean Algorithm in Z that can be applied to 
compute gcd’s of polynomials. 

Theorem 6.34 (Euclidean Algorithm I). If k is a field and fix), g(x) e k[x], 
then there is an algorithm computing gcd (/, g). 

Proof The proof is essentially a repetition of the proof of the Euclidean Algo- 
rithm in Z; just iterate the Divison Algorithm. Each line comes from the line 
above it by moving some terms “southwest.” 

8 = <hf + n 
/ = cwi + r 2 
r \ = <73 f'2 + r 3 


l'n-3 = dn-1 l'n-2 + r n - 1 

t'n—2 = q n r n - 1 + r„ 

fn — 1 = < 7 / 1 — 1 In ■ 

Since the degrees of the remainders are strictly decreasing, the procedure must 
stop after at most deg (/) steps. The claim is that d = r n is the gcd, once 
it is made monic. We see that d is a common divisor of / and g by back 
substitution: repeated applications of “2 out of 3,” working from the bottom 
up. To see that d is the gcd, work from the top down to show that if c is any 
common divisor of / and g, then c \ r, for every i . ■ 

The Euclidean Algorithm may not produce a monic last remainder. The gcd 
is the monic associate of the last nonzero remainder. 

Example 6.35 (Good Example). Let 

/ (x) = 3x 3 — 2x 2 + 3x — 2 and g(x) = 3x 2 + x — 2; 
we compute gcd (/, g). 

3x 3 — 2x 2 + 3x — 2 = (x — l)(3x 2 + x — 2) + (fix — 4) 

3x 2 + x — 2 = (jx + j ))(6x — 4) + 0. 

Rewriting in simpler notation: 

/ = (x - 1 )g + r 

8 = {h x + h) r - 

The last remainder is 6x — 4. As we warned, it’s not monic, and we must make 
it so. Thus, we need to take its monic associate (multiplying by i): 

gcd (fig) = x-\. k 
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It’s the same for ordinary 
long division — hand calcu- 
lations for small integers 
are quite simple, but they 
get very messy when 
dividing two large integers. 


See Exercise 6.1 8(i) on 
page 247. 


Example 6.36 (Bad Example). The Euclidean Algorithm applied to integers 
is quite efficient, in terms of the number of steps it takes to get to the answer. 
It’s the same for polynomials, but the steps get quite cumbersome when carried 
out by hand — the complexity comes from the computational overhead in the 
hand calculations, not in the efficiency of the algorithm itself. A CAS removes 
this obstacle. 

The following steps calculate gcd(x 4 — x 3 — 5x 2 + 8x— 4, 3x 3 — 6x 2 + x— 2) 
via the Euclidean Algorithm; all the quotients and remainders were calculated 
with a CAS: 

x 4 — x 3 — 5x 2 + 8x — 4 = (\x + i) (3x 3 — 6x 2 + x — 2) 

+ (-^* 2 + ¥*-¥) 

3x 3 - 6x 2 + x - 2 = (-&x - &) (~fx 2 + f x - ¥) + (lx - 1) 

_ f * 2 + fx _ f = (_ 40 x + i 0 )( 7 x _ 7 ) 

Multiplying by 4 produces the gcd of x — 2. ▲ 

Here is an unexpected bonus from the Euclidean Algorithm. 

Corollary 6.37. Let k be a subfield of a field K, so that k [x] is a subring 
of K\x\. If f(x),g(x) € k[x\, then their gcd in k\x\ is equal to their gcd in 
K[x). 

Proof. We may assume that f 0, for gcd(0, g) = g (actually, g’s monic 
associate). The Division Algorithm in K[x\ gives 

g = Qf + R , 

where Q. R e K[x\ and either R = 0 or deg (R) < deg (/); since f, g e k[x\, 
the Division Algorithm in k [x] gives 

g = qf + r, 

where q,r e k[x] and either r = 0 or deg(r) < deg(/). But the equation 
g = qf + r also holds in K[x\ because k[x] C K\x\, so that the uniqueness 
of quotient and remainder in the Division Algorithm in K [x] gives Q = q e 
k[x] and R = r G k [x] . Therefore, the list of equations occurring in the 
Euclidean Algorithm in K [x] is exactly the same as the list occurring in the 
Euclidean Algorithm in the smaller ring k [x] . In particular, the gcd, being the 
last remainder (made monic), is the same in both polynomial rings. ■ 

To illustrate, even though there are more divisors with complex coefficients, 
the gcd of 3x 3 — 2x 2 + 3x — 2 and 3x 2 + x — 2, computed in R[x], is equal 
to their gcd computed in C [x] . 

As in Z, the Division Algorithm in k[x] can also be used to compute coef- 
ficients occurring in an expression of the gcd as a linear combination. 

Theorem 6.38 (Euclidean Algorithm II). If k is a field and f (x), g(x) e 
k[x\, then there is an algorithm finding a pair of polynomials s(x) and f(x) 
with gcd {f g) = sf + tg. 
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Proof. Let d = gcd (/, g). To find s and t with d = sf + tg, again work from 
the last remainder back to / and g: 

r n = i'n-2 - q n r „- 1 

= r „_ 2 - q n (r „- 3 - q n -ir n - 2 ) 

= (1 T qnqn—\)f . 2 q n r n — 3 


= sf + tg m 

Example 6.39. Let’s compute gcd(/, g). where / (x) = x 3 — 2x 2 + x — 2 and 
g (x) = x 4 - 1 . 

x 4 — 1 = (x + 2)(x 3 — 2x 2 + x — 2) + (3x 2 + 3) 
x 3 — 2x 2 + x — 2 = (x + 2)(3x 2 + 3) + 15. 

Rewriting in simpler notation: 

g = (* + 2 )/ + r 
f = (x + 2 )r + 15. 

We see that the last remainder, 15, is a constant. As we warned, it need not 
be monic, and we must make it so. Thus, gcd(/, g) = 1; that is, / and g are 
relatively prime. 

We now use Euclidean Algorithm II to find s(x), t(x) with d = sf + tg. 
Using letters, 

d = f - q'r 

g = f - q'(g - qf ) 

= (1 + q'q)f - q'g. 

Now set r = 3x 2 + 3, q = x + 2, and q' = 3x + 6 . We have 

15 = ((1 + (3x + 6 )(x + 2 ))/ - (x + 2)g 
= (3x 2 + 12x + 13)/ - (x + 2 )g. 


Since gcd’s are monic, 

1 = iM 3 * 2 + 12x + 13 )/ “ Tsi* + 2 )g- 

A computer can be programmed to carry out Euclidean Algorithm II (see 
Exercise 6.18 on page 247). Once programmed, messy calculations are not a 
problem. Indeed, using the polynomials from Example 6.36, we have 

gcd(x 4 — x 3 — 5x 2 + 8 x — 4. 3x 3 — 6 x 2 + x — 2) = x — 2 

Working as above (with the help of a CAS), we get 

(to x + mH * 4 - x3 ~ 3x2 + 8 x - 4) 

+ {-jqX 2 - |jx + 22) (3x 3 - 6 x 2 + x - 2) 

7 7 

= 4 X - 2' 

Multiplying both sides of this equation by j gives the linear combination. ▲ 
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The next result, an analog for polynomials of the Fundamental Theorem 
of Arithmetic, shows that the factorization of a polynomial as a product of 
irreducible polynomials is essentially unique. 

Theorem 6.40 (Unique Factorization). If k is a field, then every noncon- 
stant polynomial f (a) e A' [a ] is a product of a nonzero constant and monic 
irreducibles. 

Moreover, if f has two such factorizations, 

f = ap\ ■■■ p m and f = bqi---q n , 

( that is, a and b are nonzero constants and the p ’s and q ’s are monic irredu- 
cibles), then a = b, m = n, and the q ’s may be re-indexed so that q ,■ = />,■ for 
all i. 

Proof. We proved the existence of a factorization in Corollary 6.9. 

To prove uniqueness, suppose that there is an equation 

api---Pm = bqi---q n 

in which a and b are nonzero constants and the p’s and q ' s are monic ir- 
reducibles. We prove, by induction on M = max{ m,n) > 1, that a = b, 
m = n, and the q ' s may be re-indexed so that q\ = p\ for all i . For the base 
step M = 1, we have ap\ = bq\. Now a is the leading coefficient, because 
pi is monic, while b is the leading coefficient, because q\ is monic. Therefore, 
a = b, and canceling gives p \ = q\. For the inductive step, the given equation 
shows that p m \ qi ■ ■ • q n . By Euclid’s Lemma for polynomials, there is some 
i with p m | qj . But q \ , being monic irreducible, has no monic divisors other 
than 1 and itself, so that q\ = p m . Re-indexing, we may assume that q n = p m . 
Canceling this factor, we have ap\ ■ ■ ■ p m -\ = bq\ ■ ■ ■ q n -i ■ By the inductive 
hypothesis, a = b, m — l = n — 1 (hence m = n) and, after re-indexing, 
qi = Pi for all i . ■ 

Here is another way to state uniqueness, using Proposition 6.24: after re- 
indexing, the ideals ( p \ ), . . . , ( p m ) and (q \ ), . . . , ( q m ) are the same. 

Collect like factors. 

Definition. Let / ( x ) e k [a], where k is a field. A prime factorization of / is 


fix) = ap i (x) ei ■■■p m (xf m , 


where a is a nonzero constant, the pfi s are distinct monic irreducible polyno- 
mials, and e / > 0 for all i . 

Theorem 6.40 shows that every nonconstant polynomial / has prime fac- 
torizations; moreover, if all the exponents e / > 0, then the factors in it are 
unique. Let /(a), g(x) € k[ a], where k is a field. As with integers, using zero 
exponents allows us to assume that the same irreducible factors occur in both 
prime factorizations: 

f = Pi 1 ’" Pm' and 8 = Pi 1 ■ ■ ■ Pm"- 
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Definition. If / and g are elements in a commutative ring R, then a common 
multiple is an element h G R with / | h and g \ li. If / and g in R are not 
both 0, define their least common multiple , denoted by 

1cm {f g), 

to be a monic common multiple c of them with c \ h for every common multi- 
ple h. If / = 0 = g, define their 1cm to be 0. 

We now use prime factorizations having zero exponents. 

Proposition 6 . 41 . Let f (x), g(x) G k[x], where k is afield, have prime fac- 
torizations f = p a f • • • Pn n and g = p b f • • • Pn n in k[x], where a, , b\ > 0 for 
all i. 

(i) / | g if and only if at < bt for all i. 

(ii) If m i = min{a ( , bi} and Mj = max{a, , hi), then 

gcd ■(/, g) = P™ 1 ■ ■ ■ Pn" and lcm(/, g) = pf l ■■■ pM” . 

Proof, (i) If / | g, then g = fh, where h = p c f ••• p„ n and c, > 0 for 
all i . Hence, 

g = P\ l ■ ■ ■ Pn’' = {PV- Pm ' 1 ) (P? • • • fn) = P?^ ~ ' P^ • 

By uniqueness, a, + c, = bi \ hence, a, < a, + Ci = bj. Conversely, 
if a, < bj , then there is c, > 0 with bj = a,- + c, . It follows that h = 
P c i • • • Pn n e k[x] and g = fh. 

(ii) Let d = p • • • pf n . Now d is a common divisor, for m / < «/, bj. If 
D = p e f ■ ■ ■ pffi is any other common divisor, then 0 < e, < min{a, , bf = 
nij, and so D \ d . Therefore, deg (D) < deg (d), and d is the gcd (for it 
is monic). The argument for 1cm is similar. ■ 

Corollary 6 . 42 . Ifk is a field and fix), g(x) € k[x\ are monic polynomials, 
then 


1cm if g) gcd(/, g) = fg. 


Proof. The result follows from Proposition 6.41, for m, + Mj = a, + bj. ■ 

Since the Euclidean Algorithm computes the gcd in k[x] when k is a field. 
Corollary 6.42 computes the 1cm. 


1cm i f g) = 


.fg 

gcd if g ) ' 


We can use roots to detect whether two polynomials are relatively prime. 


Corollary 6.43. If fix), gix) g R[.y] have no common root in C, then f, g 
are relatively prime in R [x]. 

Proof. Assume that cl = gcd ifg) f 1, where d e R[x]. By the Funda- 
mental Theorem of Algebra, cl has a complex root a . By Corollary 6.37, 
cl = gcd ifg) in C[x]. Since (x — a) | d in C[x], we have (x — a) \ f 
and (x — a) \ g. By Corollary 6.15, a is a common root of / and g. ■ 
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How to Think About It. There is nothing magic about R and C. It can be 
proved that every field k has an algebraic closure k\ that is, there is a field k 
containing k as a subfield, and every fix) e k [x] is a product of linear factors. 
In particular, since k[x\ C k[x\, every fix) e k\x\ is a product of linear 
factors in k[x\, that is, k contains all the roots of /. Thus, Corollary 6.43 can 
be generalized by replacing R and C by k and k. 

We know that C can be viewed as a vector space over its subfield R, and 
diniR (C ) = 2. But things are not so simple for algebraic closures k of other 
fields k. It is always true that k is a vector space over k, but its dimension need 
not be 2. In fact, dim/ c (/c) need not even be finite: for example, dim<Q>(Q) = oo 
and, if k is finite, then dim^(/t ) = oo. 


Let k be a field, and assume that all the roots of a polynomial / (x) e k[x\ 
lie in k: there are a, r \ , . . . , r n e k with 

n 

fix) = a J~](x - /-/). 

/ =1 

If r i , . . . , r s , where s <n, are the distinct roots of / , then a prime factorization 
of / is 


fix) = a(x - r i) ei (x - r 2 ) e2 ■ ■ ■ (x - r s ) es . 

We call Cj the multiplicity of the root r 7 . As linear polynomials are always 
irreducible, unique factorization shows that multiplicities of roots are well- 
defined. 

Exercises 

6.25 Let f(x), g(x) e k[x), where k is a field. If fg is a square, must / or g be a 
square? What if gcd(/, g) = 1? 

6.26 Let fix), g(x) E k\x\, where k is a field, be relatively prime. If h(x) 6 k[x\ and 
h 2 | fg, prove that h 2 \ f or h 2 \ g. 

6.27 Let A' = F 2 (x). Prove that f(t) = t 2 — x 6 k [/] is an irreducible polynomial. (We 
shall see later that there is a field K containing k and an element u with u 2 = x, 
so that f(t) = (t — u) 2 in 

6.28 In Zp[x], show that if / is an irreducible factor of x p " — x, then f 2 does not 
divide x p " — x. 

6.29 Determine, for each of the following polynomials in Q[.v] whether or not it is 
irreducible in Q[x], in R[.v], or in C[.v], 

(i) x 2 — lx + 6. 

(ii) x 2 + 2x — 1. 

(iii) x 2 + x + 1. 

6.30 * Show that f(x) = x 3 + 5.v 2 — lOx + 15 is irreducible in Q[x]. 

In Section 6.2, we will give different criteria for determining whether poly- 
nomials are irreducible (in particular, we will discuss / on page 267). However, 
we ask you to solve this problem now so you will appreciate the theorems to be 
proved. 
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Principal Ideal Domains 

There are other classes of domains that enjoy an analog of the Fundamental 
Theorem of Arithmetic; one such is the following. 

Definition. A domain R is a principal ideal domain, usually abbreviated by 
the acronym PID, if every ideal in R is a principal ideal. 

We have already seen examples of PIDs. 

Example 6.44. (i) Theorem 5.29 shows that Z is a PID. 

(ii) Theorem 6.25 shows that k[x] is a PID when k is a field. 

(iii) Every field k is a PID, for its only ideals are k = (1) and (0). 

(iv) Not every domain is a PID. In Example 6.27, we saw that there are ideals 
in Z[x] that are not principal ideals. 

(v) Exercise 6.12 on page 243 shows that k[x, y], polynomials in two vari- 
ables over a field k, is not a PID. ▲ 

(vi) In Chapter 8, we shall see that the rings of Gaussian integers and of Eisen- 
stein integers are PIDs. 

PIDs enjoy many of the properties shared by Z and k[x] (k a field). In 
particular, they have a fundamental theorem of arithmetic, and the proof of 
this fact parallels the program we developed for Z and k [x] . 

We begin by defining gcd’s in a general commutative ring R. We can’t use 
< as we did in Z, nor can we use degrees as we did in k[x\, but we can use the 
idea in Corollaries 1.20 and 6.29. 

Definition. Let R be a commutative ring. If a, b e R, then a gcd of a, b is a 
common divisor d e R that is divisible by every common divisor: if c \ a and 
c | b, then c \ d. 

Just defining a term doesn’t guarantee it always exists — we could define 
unicorn if we were asked to do so, and there are rings, even domains, con- 
taining elements having no gcd (see Exercise 6.33 on page 259). Even if a 
gcd does exist, there is the question of uniqueness. In Z, uniqueness of a gcd 
follows from our assuming, as part of the definition, that gcd’s are positive; 
in k\x\, uniqueness follows from our assuming, as part of the definition, that 
gcd’s are monic. Neither assumption makes sense in a general commutative 
ring; however, we do have a measure of uniqueness in domains. 

Proposition 6.45. Let R be a domain. If d and d' are gcd ’.v of a, b in R, then 
d and d' are associates and (d) = (d r ). 

Proof. By definition, both d and d' are common divisors of a , b ; moreover, 
d | d' and d' \ d . Since R is a domain. Proposition 6.6 applies, and d and d' 
are associates. The second statement follows from Proposition 6.24. ■ 

Although there are domains with elements not having a gcd, we now show 
gcd’s always exist in PIDs. 
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This proof should look 
quite familiar to you. 


Theorem 6.46. Let R be a PID. If a, b e R, then a gcd of a, b exists and it is 
a linear combination of a, b. 

Proof. As every ideal in R , the ideal ( a,b ) = {ua + vb : u,v € R} is a 
principal ideal: there is d € R with ( a,b ) = ( d ). Of course, d is a linear 
combination of a , b, say, d = sa + tb for some s, t e R, and it suffices to 
prove d is a gcd. Now d is a common divisor: a € ( a,b ) = id), so that 
a = rd for some r € R; hence, d \ a\ similarly, d \ b. Finally, if c \ a and 
c | b, then c \ d = sa + tb. . ■ 

We can now show that Euclid’s Lemma holds in PIDs. 

Theorem 6.47 (Euclid’s Lemma). Let R be a PID and p e R be irreducible. 
If p | ctb, where a,b e R, then p \ a or p \ b. 

Proof. If p \ a, then 1 is a gcd of p,a, for the only divisors of p are units and 
associates. Thus, Theorem 6.46 says that there exist s,t e R with 1 = sp + ta. 
Hence, b = spb + tab. But ab = pr , for some r e R, and so p \ b, as desired. 


How to Think About It. 

To prove the unique factorization theorem in Z and in k [. x ], we first proved 
that every element can be factored into irreducibles. After that, we showed that 
such factorizations are essentially unique. Let’s carry on with that development 
for arbitrary PIDs. 


To prove factorization into irreducibles in a PID, we need an abstract prop- 
erty of principal ideal domains, one that was previewed in Exercises 5.47 and 
5.48 on page 220. 

Suppose that R is a PID and that r & R is neither zero nor a unit. Must r be 
a product of irreducibles? If not, then r is not irreducible (recall that we allow 
products to have only one factor); thus, r factors: say, r = ab , where neither 
a nor b is a unit. If both a and b are products of irreducibles, then so is r, and 
we’re done. So, suppose one of them, say a, is not a product of irreducibles. 
Thus, a is not irreducible, and a = cd. where neither c nor d is a unit. If both 
c and d are products of irreducibles, then so is a , and this is a contradiction. 
We’ve got a tiger by the tail! We can keep repeating this argument ad infinitum. 

Let’s rephrase these factorizations of r in terms of ideals; after all, a \ r 
says that r = r'a for some r' e R\ that is, (r) C (a). But this inclusion must 
be strict: (r) C ( a ), lest r and a be associates (they’re not, because b is not a 
unit). The tiger tells us that there is an infinite strictly increasing sequence of 
ideals. 

Lemma 6.48. If R is a PID, then every ascending chain of ideals 


h £ I 2 C • • • C I n C I n+ i C • • • 


stops', that is, there is N with I n = In for all n > N . 

Proof. Suppose there is an ascending chain of ideals that does not stop. Throw- 
ing away any repetitions I n = I n +\ if necessary, we may assume that there is 
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a strictly ascending chain of ideals 

h c / 2 c... c /„ c 7„ +1 C... . 

By Exercise 6.31 on page 258, J = U«>i is an ideal in R. And since R 
is a PID, J is principal; there is d e J with J = id). Now d got into J by 
being in I m for some m ; that is, (d ) C I m . Hence, 

J = (d) c l m c c J . 


This is a contradiction. ■ 

Lemma 6.48 gives us factorization into irreducibles. 

Proposition 6.49. If R is a PID, then every nonzero, non-unit r e R is a 
product of irreducibles. 

Proof If a is a divisor of r, then (r) C (a), as we saw above. Call a divisor a 
of r a proper divisor if a is neither a unit nor an associate of r. If a is a proper 
divisor of r, then (r) C (a): if the inclusion is not strict, then (a) = (r ), and 
this forces a and r to be associates, by Proposition 6.6. 

Call a nonzero non-unit r € R sweet if it is a product of irreducibles; call it 
sour otherwise. We must show that there are no sour elements. So, suppose r 
is a sour element. Now r is not irrreducible, so r = ab , where both a and b are 
proper divisors. But the product of sweet elements is sweet, so that at least one 
of the factors, say, a, is sour. As we observed in the first paragraph, we have 
(r) C (a). Repeat this for a instead of r. It follows by induction that there 
exists a sequence a\ = r, a 2 = a, CI 3 , of sour elements with each 

a n + 1 a proper divisor of a„ . But this sequence yields a strictly ascending chain 
of ideals 


(a 1) c (a 2) c (a 3 ) c , 

contradicting Lemma 6.48. ■ 

Proposition 6.49 gives existence. The next theorem gives a fundamental the- 
orem of arithmetic for PIDs: every nonzero non-unit has a unique factorization 
as a product of irreducibles. 

Theorem 6.50. Let R be a PID. Every r e R, neither 0 nor a unit, has a fac- 
torization as a product of irreducibles which is unique in the following sense: 

if 


Pi ■■■Pn = r = qi ■■■qm, 

where the p ’s and q ’s are irreducible, then m = n and the q ’s can be re- 
indexed so that qt and pi are associates for all i. 

Proof. Proposition 6.48(iii) shows that every r e R, neither 0 nor a unit, is a 
product of irreducibles. 

To prove uniqueness, suppose that r is a nonzero non-unit and 


Pi ■■■Pn = r = qi 
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where the p's and q s are irreducible. By Euclid’s Lemma, p„ irreducible im- 
plies p n | qi for some i . Since qi is irreducible, we have p n and q, are asso- 
ciates: there is a unit u € R with q\ = up n . Re-index the q's so that t/, is now 
q m = up n , cancel p n from both sides, and replace q\ by u q \ . Thus, 

Pi = r = (uqi) ■ ■ ■ q m -\ . 

Note that uqi is irreducible (for it is an associate of an irreducible). The proof 
is completed, as in Theorem 6.40, by induction on max{« . m j. ■ 

So, every PID has a fundamental theorem of arithmetic. It turns out that 
there are other domains occurring in nature, not PIDs, which also enjoy such a 
theorem. 

Definition. A domain R is a unique factorization domain, usually abbreviated 
UFD, if 

(i) every a € R that is neither 0 nor a unit is a product of irreducibles; 

(ii) this factorization is unique in the following sense: if 

Pi ■■■Pn = a = qi ■■■q m , 

where the p's and q's are irreducible, then n = m and, after re-indexing. 
Pi and qt are associates for all i . 


Further Results. We’ve just seen that every PID is a UFD. The converse is 
false: there are UFDs that are not principal ideal domains. A theorem of Gauss 
states that if a domain A is a UFD, then A[x] is also a UFD. For example, Z[x] 
is a UFD (this is not a PID). If k is a field, then it follows by induction on 
n > 1 that R = k[x i, . . . , x n \, polynomials in several variables, is a UFD (R 
is not a PID if n > 2). 


As we’ve mentioned earlier, the erroneous assumption that every domain 
is a UFD was behind many incorrect “proofs” of Fermat’s Last Theorem. The 
ring Z[V-5] is not a UFD: we’ll see in Chapter 8 that 

3 • 2 = 6 = (1 + V=5)(l - 

are two different factorizations of 6 into irreducibles in Z[V— 5] (and 1 + V— 5 
is not an associate of 2 or of 3). Another example: Z[£ 2 3 ] is not a UFD, and 23 
is the smallest prime p for which Z[£ p ] is not a UFD (see [23] Chapter 1, p. 7). 


Exercises 

6.31 * 

(i) Let I and J be ideals in a commutative ring R. Prove that their union I U J 
is an ideal if and only if I C / or / C I . 

(ii) Let 1 1 C / 2 C • ■ ■ C /„ C • • • be an ascending chain of ideals in a commu- 
tative ring R. Prove that 

oo 

n> 1 


is an ideal in R. 
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6.32 Consider ascending chains of ideals in Z: 

(i) Find two different ascending chains of ideals in which I\ = (24). 

(ii) Show that every ascending chain of ideals has only finitely many distinct 
terms. 

(iii) Find the longest strictly ascending chain of ideals that starts with (72) (an 
ascending chain of ideals is strictly ascending if all inclusions I j C /, + i are 
strict inclusions Ij C + 

(iv) Find the longest strictly ascending chain of ideals that starts with (101). 

6.33 * Let R be the subset of k[x] (where k is a field) consisting of all polynomials 
/ ( x ) having no linear term; that is, 

fix) = «o + 02 X 2 + agX 3 + ••• . 

(i) Prove that R is a subring of k [x] . 

(ii) Prove that ,r s and x 6 do not have a gcd in R. 

6.34 Recall that R , the set of all real valued functions of a real variable, is a commu- 
tative ring under pointwise addition and multiplication. Let n > 0 be an integer, 
and let /„ be the set of all functions in R r vanishing on integer multiples of n. 

(i) Show that /„ is an ideal in R r . 

(ii) Find a function that is in Ig but not in / 4 . 

(iii) Show that 


hZhZhZ-'-Zht £••• 

(iv) Show that this ascending chain of ideals does not stop. 

(v) Conclude that there are ideals in R r that are not principal. 

6.2 Irreducibility 

Although there are some techniques to help decide whether an integer is prime, 
the general problem is open and is very difficult (indeed, this is precisely why 
RSA public key codes are secure). Similarly, it is very difficult to determine 
whether a polynomial is irreducible, but there are some useful techniques that 
frequently work. Most of our attention will be on Q[x] and Z[x], but some of 
the results do generalize to other rings of coefficients. 

For polynomials of low degree, we have a simple and useful irreducibility 
criterion. 

Proposition 6.51. Let k be afield and let f(x)& k[x] be a quadratic or cubic 
polynomial. Then f is irreducible in k[x\ if and only if f has no root in k. 

Proof An irreducible polynomial / of degree > 1 has no roots in k, by 
Corollary 6.15, for if r e k is a root, then f(x) = (x — r)g(x) in k[x\. 
Conversely, if / is not irreducible, then / = gh, where neither g nor h 
is constant; thus, neither g nor h has degree 0. Since deg (/) = 2 or 3 and 
deg (/ ) = deg(g) + deg (h), at least one of the factors has degree 1 and, hence, 
/ has a root in k . ■ 

Proposition 6.51 is no longer true for polynomials of degree > 4; for exam- 
ple, f(x) = x 4 + 2x 2 + 1 = (x 2 + l)(x 2 + 1) obviously factors inR[jc], so 
it’s not irreducible, yet / has no real roots. 
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Recall that is another 
notation for TL P ; we use it 
when we want to regard 
Ijp as a field. 


A polynomial /(x) is reducible if has a linear factor x — a, and there’s a 
simple test for that; see whether a is a root of /. But to check whether / has 
a root a, we need a candidate for a. 

Theorem 6.52 (Rational Root Theorem). If f (x) = ao + a\x-\ Ya n x n G 

Z[x] C <Q[x], then every rational root of f has the form b /c, where b \ ao and 
c | a„. In particular, if f is monic, then every rational root of f is an integer. 

Proof We may assume that a root b/c is in lowest terms; that is, gcA(b, c ) = 
1. Evaluating gives 0 = f(b/c ) = ao + a \b/c + ■■■ + a„b n /c n , and multi- 
plying through by c n gives 

0 = Qoc n -Y a\bc n i ~Y ••• -I- a nb n . 

Reducing this mod b shows that b \ aoC n ; since gcd(b, c) = 1, Corollary 1.22 
gives b | ao ■ Similarly, reducing mod c gives c | a n b n . Since gcd(£>, c) = 1, 
we have c \ ci n . ■ 

It follows from the second statement that if an integer a is not the nth power 
of an integer, then x n — a has no rational roots; that is, 'ifa is irrational. In 
particular, \fl is irrational. Thus, Theorem 6.52 is a vast generalization of 
Proposition 1.26. 

Had we known Theorem 6.52 earlier, we could have easily dealt with the 
“bad cubic” f(x) = x 3 — lx + 6 in Example 3.5. Since the candidates for its 
rational roots are ±1, ±2, ±3, ±6, we would have quickly found the factor- 
ization f(x) = (x — l)(x — 2)(x + 3). 

If f(x) G Q[x] happens to be in Z[x\, there is a useful theorem of Gauss 
comparing the factorizations of / in Z[x\ and in Q[x] that concludes that / 
is irreducible over Q. Our proof involves Example 5.23: the homomorphism 
r p : Z — > Z p , sending j i-* [_/], gives a homomorphism r*:Z[x\ —> Z p [x\, 
called reduction mod p. If / (x) = ao + a\x + ■ ■ ■ + a n x n G Z[x\, then 

r p : f i— > /.where / (x) = [«o] + [a\\x -\ 1- [a„\x n g Z p [x\. 

Thus, r* merely reduces all coefficients mod p. 

Theorem 6.53 (Gauss’s Lemma). Let f (x) G Z[x]. If there are G(x), H (x) G 
Q[x] with f = GH, then there are g(x), h(x) G Z[x] with deg(g) = deg(G), 
deg(/f) = deg (H), and f = gh. 

Proof. Clearing denominators in the equation / = GH, there are positive 
integers n' , n" so that g = n'G and h = n"H, where both g, h lie in Z[x]. 
Setting n = n'n" , we have 

nf = {n' G)(n" H) = gh inZ[x]. (6.1) 

Let p be a prime divisor of n, and reduce the coefficients mod p. Eq. (6.1) 
becomes 

0 = g(x)h(x). 

But F^[x] is a domain, because is a field, and so at least one of the factors, 
say g, is 0; that is, all the coefficients of g are multiples of p . Therefore, we 
may write g = pg' , where all the coefficients of g' lie in Z. If n = pm, then 

( P m )f = nf = gh = (pg')h inZ[x]. 




Cancel p, and continue canceling primes until we reach a factorization / = 
g*h* in Z[x], Note that deg(g*) = deg(g) and deg(/z*) = deg(/z). ■ 
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The contrapositive of Gauss’s Lemma is more convenient to use. 

Corollary 6.54. If f(x) € Z[x] is irreducible in Z[x], then f is irreducible 
in Q[x]. 


How to Think About It. We agree that Gauss’s Lemma, though very useful, 
is rather technical. Gauss saw that the ideas in the proof could be generalized 
to apply to polynomials in several variables over a field. 


The basic use of reduction mod p was previewed on page 216 when we 
proved that —1 is not a square in Z by showing that it’s not a square in Z3. 
Reduction mod p gives a criterion for irreducibility of / in Z[x] by testing the 
irreducibility of / in Z p [x] . The precise statement is: 


Proposition 6.55. Let f(x) = ao + a\x + ■ ■ ■ + x n e Z[x] be monic. If p 
is prime and f € F p [v] is irreducible in Fp[x], then f is irreducible in Z[x] 
and , hence, in Q[.v]. 

Proof. Suppose / factors in Z[x]; say / = gh , where 0 < deg(g) < deg (/) 
and 0 < deg(/z) < deg(/). By Exercise 6.8, we may assume that both g and 
h are monic. Now / = ~gh (for r* is a homomorphism), so that deg (/) = 
deg(g) + deg(/z). And /, ~g, and h are monic, because fg, and h are, so 
deg (/) = deg (/), deg(g) = deg(g), and deg(/z) = deg(/z); this contradicts 
the irreducibility of / in Fp [x]. Therefore, / is irreducible in Z [x]. Finally, / 
is irreducible in Q[x], by Gauss’s Lemma. ■ 

For example, x 2 + 1 is irreducible in Q[v] because x 2 + 1 is irreducible 
in Z 3 [x], _ 

Theorem 6.55 says that if one can find a prime p with / irreducible in 
F p [x], then / is irreducible in Q[.v], The finiteness of F p is a genuine advan- 
tage, for there are only a finite number of polynomials in IF /; [.v] of any given 
degree. In principle, then, we can test whether a polynomial of degree n in 
Z p [x\ is irreducible by looking at all possible factorizations of it. 

The converse of Theorem 6.55 is false: x 2 — 2 is irreducible in Q[x] (it has 
no rational root), but it factors mod 2 (as x 2 ); you can check, however, that 
x 2 — [2] is irreducible in Fj[a']. But Theorem 6.55 may not apply at all: we’ll 
see in Example 6.67 that x 4 + 1 is irreducible in Q[jc], but it factors in Fp[.x] 
for every prime p (see [26], p. 304). 

In order to use Theorem 6.55, we will need an arsenal of irreducible poly- 
nomials over finite fields. 


The hypothesis that / is 
monic can be relaxed; we 
may assume instead that p 
does not divide its leading 
coefficient. 


Example 6.56. We determine the irreducible polynomials in F2 [.t] of small 
degree. 

As always, the linear polynomials x and x + 1 are irreducible. 

There are four quadratics: x 2 , x 2 + x, x 2 + 1, x 2 + x + 1 (more generally, 
there are p n monic polynomials of degree n in Fp[x], for there are p choices 
for each of the n coefficients ao, ■ ■ ■ , a„- 1). Since each of the first three has a 
root in F2, there is only one irreducible quadratic, namely, x 2 + x + 1. 
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Note that —1 = 2 in Z 3 . 


There are eight cubics, of which four are reducible because their constant 
term is 0 (so that x is a factor). The remaining polynomials are 

x 3 + 1 , x 3 +x+l, x 3 + x 2 + l, x 3 + x 2 + x + l. 

Since 1 is a root of the first and fourth, the middle two are the only irreducible 
cubics. Proposition 6.51 now applies. 

There are sixteen quartics, of which eight are reducible because their con- 
stant term is 0. Of the eight with nonzero constant term, those having an even 
number of nonzero coefficients have 1 as a root. There are now only four sur- 
viving polynomials /, and each has no roots in F 2 ; that is, they have no linear 
factors. The only possible factorization for any of them is / = gh , where both 
g and h are irreducible quadratics. But there is only one irreducible quadratic, 
namely, x 2 + x + 1. Therefore, x 4 + x 2 + 1 = (x 2 + x + l ) 2 factors, and the 
other three quartics are irreducible. 

Irreducible Polynomials of Low Degree over F 2 

Degree 2: x 2 + x + 1. 

Degree 3: x 3 + x + 1, x 3 + x 2 + 1. 

Degree 4: x 4 + x 3 +l, x 4 + x+l, x 4 + x 3 + x 2 + x + 1. ▲ 


Example 6.57. Here is a list of the monic irreducible quadratics and cubics 
in F 3 [x ] . You can verify that the list is correct by first enumerating all such 
polynomials; there are six monic quadratics having nonzero constant term, and 
there are eighteen monic cubics having nonzero constant term. It must then be 
checked which of these have 1 or —1 as a root, for Proposition 6.51 applies. 


Monic Irreducible Quadratics and Cubics over F 3 


Degree 2: 
Degree 3: 


x 2 + 1, 

X 3 — X + 1, 

X 3 - X 2 + X + 1, 
X 3 + X 2 + x - 1, 


X 2 + x — 1, 

X 3 + X 2 - x + 1, 
x 3 — x — 1, 

x 3 — x 2 — x — 1. ▲ 


x 2 — x — 1. 
X 3 -x 2 + 1, 
X 3 + x 2 - 1, 


Example 6.58. Here are some applications of Theorem 6.55. 

(i) The polynomial f(x) = 3x 3 — 3x + 1 is irreducible in Q[x], for / = 
x 3 + x + 1 is irreducible in F 2 [x], 

(ii) We show that f(x) = x 4 — 5x 3 + 2x + 3 is irreducible in Q[x]. By 
Theorem 6.52, the only candidates for rational roots of / are ± 1 and ±3, 
and you can check that none is a root. Since / is a quartic, we cannot yet 
conclude that it is irreducible, for it might be a product of (irreducible) 
quadratics. 

The criterion of Theorem 6.55 works like a charm. Since / = x 4 + 
x 3 + 1 in F 2 [x] is irreducible, by Example 6.56, it follows that / is irre- 
ducible in Q [x] . (It wasn’t necessary to check that / has no rational roots; 
irreducibility of f is enough to conclude irreducibility of /. In spite of 
this, it is a good habit to first check for rational roots.) 

(iii) Let d> 5 (x) = x 4 + x 3 + x 2 + x + 1 e Q[x]. In Example 6.56, we saw 

that <J>s(x) = x 4 + x 3 + x 2 + x + 1 is irreducible in F 2 [x], and so O 5 is 
irreducible in Q [x] . ▲ 
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Further Results. 

We can count the number N n of irrreducible polynomials of degree n in 
Fp [x] . In [17], pp. 83-84, it is shown that 

p n = Y j dN d , ( 6 . 2 ) 

d\n 


where the sum is over the positive divisors d of n. 

This equation can be solved for N n . If in = p e f ■ ■ ■ p L n n , define the Mobius 
function by 


( 1 if m = 1 ; 

li(m) = < 0 if any e, > 1; 

[(-1)" if 1 = e\ = e 2 = ••• = e n . 

It turns out that Eq. (6.2) is equivalent to 

N n = -Y i d(d)p n/d . 

d\n 

One application of this formula is that, for every n > 1, there exists an 
irreducible polynomial in F p [x] of degree n. 


The definition of /x seems 
to come out of nowhere, 
but it occurs in many prob- 
lems at the intersection of 
combinatorics and num- 
ber theory. See A.Cuoco, 
“Searching for Mobius,” 
College Mathematics 
Journal , 37:2, (148-153), 
2009. 


Exercises 

6.35 Let f(x) = x 2 + x + 1 e F2[x]. Prove that / is irreducible in F2[x], but that 
/ has a root a e F 4 . Use the construction of F 4 in Exercise 4.55 on page 165 to 
display a explicitly. 

6.36 Show that x 4 + x + 1 is not irreducible in R[.v] even though it has no roots in R. 

6.37 (i) If k is a field and each of / (x), g(x) e k[x\ has a root a in k , show that a is 

a root of gcd(/, g). 

(ii) How does this apply to the polynomials in Examples 6.35 and 6.36? 

6.38 If p is a prime, show that, in 1 p [x], 

p — 1 p — 1 

x p — x = ]""[ (x — i) and x ^ -1 — 1 = (x — i). 

i= 0 7=1 

6.39 (Wilson's Theorem). Suppose that p is a prime in Z. Show that 


( p —!)! = —! mod p. 


6.40 * 

(i) Let f(x) = (x — a\) ■ ■ ■ (x — a„) e k[x], where k is a field. Show that / has 
no repeated roots (i.e., all the a; are distinct) if and only if gcd(/, /') = 1 , 
where f is the derivative of /. 

Hint. Use Exercise 5.17 on page 203. 

(ii) Prove that if p(x) 6 Q[x] is an irreducible polynomial, then p has no re- 
peated roots in C. 


Hint. Use Corollary 6.37. 
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6.41 If p is prime, prove that there are exactly i(/j 3 — p) monic irreducible cubic 
polynomials in ¥ p [x] . 

6.42 Determine whether the following polynomials are irreducible in Q [x] . 

(i) / (x) = x 5 — 4x + 2. 

(ii) / (x) = x 4 + x 2 + x + 1. 

Hint. Show that / has no roots in F3 and that a factorization of / as a product 
of quadratics would force impossible restrictions on the coefficients. 

(iii) fix) = x 4 — 10x 2 + 1. 

Hint. Show that / has no rational roots and that a factorization of / as a 
product of quadratics would force impossible restrictions on the coefficients. 

6.43 Is x s + x + 1 irreducible in F2[x]? 

Hint. Use Example 6.56. 

6.44 Let fix) = ix p — 1 )/(x — 1), where p is prime. Using the identity 

fix + 1) = x p ~ x + pq(x), 
where qix) e Z[x] has constant term 1, prove that 

<S>p{x pn ) = x pn ( p - 1) + --- + x pn +1 
is irreducible in Q[x] for all n > 0. 

6.45 Let k be a field, and let fix) = <20 +<Jix H \-a n x n e k[x\ have degree n and 

nonzero constant term ao- If / is irreducible, prove that a n +a n — ix + - • -+aox n 
is irreducible. 


Roots of Unity 

In Chapter 3, we defined an nth root of unity £ to be primitive if every nth root 
of unity is a power of f For example, i is a primitive 4th root of unity. Note 
that i is also an 8th root of unity, for i 8 = 1, but it’s not a primitive 8th root of 
unity; ^(1 + i) is a primitive 8th root of unity. 

Lemma 6.59. Every nth root of unity £ e C is a primitive dth root of unity 
for a unique divisor d ofn. 


Proof. We know that = 1 ; let d be the smallest positive integer for which 
f l = 1. By the Division Algorithm, there are integers q and r with n = qd + r , 
where 0 < r < d . Now 


1 £ n f-qd+r ^dq^r 


because = I . But r < d and if = 1 ; if r > 0, then we contradict d being 
the smallest positive such exponent. Therefore, r = 0 and d \ n. This shows 
that £ is a d th root of unity. Its first d powers, 

1 , f f 


are all distinct (Exercise 6.51 on page 269). Since there are exactly d d th roots 
of unity, they are all powers of f and so £ is primitive. ■ 




Definition. If d is a positive integer, then the d th cyclotomic polynomial is 
defined by 


<M*) = no- 

where t, ranges over all the primitive c/th roots of unity. 

Proposition 6.60. Let n be a positive integer and regard x" — I e Z[x], Then 

0) 


x n -l = ]\<S> d {x), 

d\n 

where d ranges over all the positive divisors d ofn (in particular, both 
<f>i(x) and 0,j(x) are factors). 

(ii) <!>„ (x) is a monic polynomial in Z[x]. 

Proof, (i) For each divisor d of n, collect all terms in the equation x" — 1 = 
]”[ (x — f) with £ a primitive d th root of unity. Thus, 

x n - 1 = \\h d (x), 

d\n 

where h d (x) = Y\(x — if with t, an n th root of unity that is also a prim- 
itive c/th root of unity. But every such £ must be an nth root of unity: by 
Lemma 6.59, n = dq for some integer d , and 1 = = Cf lq . Therefore, 

h d (x) = Oj(x). 

(ii) The proof is by strong induction on n > 1. The base step is true, for 
<f> | (x ) = x — I . For the inductive step n > 1, write 

x"-l = <F„(x)F(x), 

where F(x) = P| ( ^d ( x ) with d \ n and d < n . The inductive hypoth- 
esis says that all the factors ( \> d of F are monic polynomials in Z[x]; 
hence, F is a monic polynomial in Z[x], By Proposition 6.10, ^ n (x) = 
(x n — 1 )/ F(x) is a monic polynomial in Z[x], as desired. ■ 

Example 6.61. The formula in Proposition 6.60(i) can be used to calculate 
<J>„ (x ) for any n . Indeed, solving for 0„ (x ) in 

v" - i = n o ( /(x) 

d\n 


we have 


4>«(x) = 


x" - 1 


n 

d\n, d<n 


Using the fact that Oi (x) = x — 1, we have a recursively defined function: 

( x — 1 if n = 1 

| (x 1)/ d <n $>d(x) if 77 > 1 . 
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This proposition sheds 
light on your discovery in 
Exercise 3.59 on page 116. 
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n 

T>„(x) 

1 

x — 1 

2 

x + 1 

3 

x 2 -1- X + 1 

4 

X 2 + 1 

5 

X 4 + X 3 + X 2 + 1 

6 

X 2 — X + 1 

7 

X 6 + X 5 + X 4 + X 3 + X 2 + 1 

8 

X 4 + 1 

9 

X 6 + X 3 + 1 

10 

x 4 — x 3 -T x 2 — x — (— 1 

11 

X 10 + X 9 + X 8 + X 7 + X 6 + X s + X 4 + X 3 + X 2 + 1 

12 

X 4 - X 2 + 1 


Figure 6.1 . Cyclotomic polynomials. 

You should verify that x 12 — 1 = ^ e (i 2 3 4 6 12 } (■*)• The recursive def- 

inition can be programmed into a CAS (see Exercise 6.55 on page 270); Fig- 
ure 6.1 displays the first dozen cyclotomic polynomials. There’s no simple 
pattern to these polynomials, but calculating a good number of them gives 
you food for thought and leads to interesting conjectures. For example, can 
you conjecture anything about deg(0„ )? All the coefficients of the cyclotomic 
polynomials displayed in Figure 6.1 are 0 and ±1, but your guess that this is 
always true is wrong Lsee Exercise 6.55(iii) on page 270], Do any of the <!>„ (x) 
factor in Z[x]? A 

When p < 11 is prime, Op(x) is x p ~ x + x p ~ 2 + ■ ■ ■ + x 2 + x + 1. We 
now prove this is true for every prime p. 

Proposition 6.62. If p is prime , 

(x) = x p 1 T- x p T- • • • T- x 2 — |— x — |— 1. 

Proof. By Proposition 6.60, 

x p - 1 = 4>i(x)3> p (x) = (x - l)<J)p(x), 

and the Division Algorithm gives 

<f>p(x) = - 1 - = x p ~ x + x p ~ 2 + • • • + x 2 + x + 1. ■ 

x — 1 

Recall that the Euler <p -function <j> (n ) is defined by 

4>(n) = number of k with 1 < k < n and gcd(/c, n) = 1. 

The next proposition shows that <p(n) is intimately related to < f>„(x), and this 
leads to a simple proof of a fact from number theory. 
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Proposition 6.63. (i) 0(n) = deg($>„). 

(ii) For every integer n > 1, we haven = n <$>{d). 

Proof, (i) This follows at once from Corollary 3.30, which says that there 
are cp{n) primitive nth roots of unity. 

(ii) Immediate from Proposition 6.60(i) and part (i), for 

n = de g($rf) = X! ■ 

d\n d\n 

We’ve shown that 0„(x) e Z[x], and we’ll finish this section by showing 
that Op is irreducible in Q[x] when p is prime. It turns out that O n is actually 
irreducible in Q[x] for every n, but the proof is more difficult (see [17] p.195). 

As any linear polynomial over a field, the cyclotomic polynomial O 2 = 
x + 1 is irreducible in Q[x]; O 3 = x 2 + x + 1 is irreducible in Q[x] because it 
has no rational roots; we saw, in Example 6.58, that O 5 is irreducible in Q[x]. 
We’ll next introduce another irreducibility criterion, useful in its own right, 
that will allow us to prove that Op is irreducible in Q[x] for all primes p. An 
example will motivate the criterion. 


Proposition 6.63(ii) is often 
proved in number theory 
courses without mentioning 
cyclotomic polynomials; 
the resulting proof is much 
more difficult. 


Example 6.64. Exercise 6.30 on page 254 asked you to show that / (x) = 
x 3 + 5x 2 — 10x+ 15 is irreducible in Z[x]. You now have machinery that makes 
this easy. For example, you could invoke Theorem 6.52 (the Rational Root 
Theorem) to show that / has no root in Q (or Z) and then use Proposition 6.51. 

But let’s use another technique that shows the power of reducing coeffi- 
cients. Suppose that /(x), g(x), h(x) e Z[x] and / = gh, where neither g 
nor h is constant; reduce the coefficients mod 5. Because reduction mod 5 is 
a homomorphism, we have / =~gh. But all the coefficients of / (except the 
leading one) are divisible by 5, so we have 

x 3 = gh inZs[x]. 

Since x is irreducible (it’s a linear polynomial), we can apply unique factor- 
ization in Z 5 [x] to conclude that both g and h are of the form ux m where u is 
a unit in Z5. Pulling this back to Z, we see that all the coefficients of g and h, 
except their leading coefficients, are divisible by 5. Hence the constant term of 
gh (which is the product of the constant terms of g and h ) is divisible by 25. 
But gh = f and the constant term of / is 15, which is not divisible by 25. 
Hence no non-trivial factorization of / exists. ▲ 


Theorem 6.65 (Eisenstein Criterion). Let f (x) = a<j + «tx 4 F a n x n e 

Z[x], If there is a prime p dividing a,- for all i < n but with p \ a n and 
p 1 \ ao, then f is irreducible in <Q[x]. 

Proof. (R. Singer). Let r* : Z [x] — > Fp[x] be reduction mod p. and let / de- 
note r* (/). If / is not irreducible in Q[x], then Gauss’s Lemma gives polyno- 
mials g(x), /;(x) e Z[x] with / = gh , where g(x) = bo + b\x -\ 1 -b m x m , 

h(x) = C 0 + C 1 X-I |-CfcX fc , and m, k > 0. There is thus an equation / = 'gh 

in Fp[x], 

Since p \ a n , we have f O', in fact, / = ux n for some unit u € 
Fp, because all its coefficients, aside from its leading coefficient, are 0. By 


Usually, Kadiddlehopper 
was the first to discover 
Kadiddlehopper’s Theo- 
rem, but not always. For 
example, the Eisenstein 
Criterion is in a paper of 
Eisenstein of 1850, but 
it appeared in a paper of 
Schonemann in 1845. 
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Theorem 6.40, unique factorization in k[x\ where k is a field, we must have 
~g = vx m and h = wx k (for units v, w in F p ), so that each of g and h has 
constant term 0. Thus, [ bo] = 0 = [co] in F p ; equivalently, p \ bo and p \ cq. 
But ao = h()CQ, and so p 2 \ ciq, a contradiction. Therefore, / is irreducible in 
Q[x]. m 

Let’s see that Q p (x) = x p ~ l + x p ~ 2 + 1- x + 1 is irreducible in Q[x] 

when p is prime. Gauss showed how to transform Op(x) so that the Eisenstein 
Criterion applies. 

Lemma 6.66. Let g(x) e Z\x]. If there is c € Z with g(x + c) irreducible in 
Z[x], then g is irreducible in Q[x], 

Proof. By Exercise 5.42 on page 216, the function cp: Z[x] — > Z[x], given by 

f(x) l-» f(x + c), 

is an isomorphism (its inverse is /(x) f-f /(x — c)). If g(x) = s(x)t(x), then 

<p(g) = <p(st) = (p(s)<p(t). (6.3) 

But <p(g) = g(x + c), so that Eq. (6.3) is a forbidden factorization of g(x + c ). 
Hence, Corollary 6.54 says that g is irreducible in Q[.v], ■ 


Example 6.67. Consider /(x) = x 4 + 1 e Q[x].Now 

/ (x + 1) = (x + l) 4 + 1 = x 4 + 4x 3 + 6x 3 + 4x + 2. 

The Eisenstein Criterion, using the prime p = 2, shows that f(x + 1) is irre- 
ducible in Q [x] , and Lemma 6.66 shows that x 4 + 1 is irreducible in Q [x] . ▲ 


Theorem 6.68 (Gauss). For every prime p, the pth cyclotomic polynomial 
Op(x) is irreducible in Q[x]. 

Proof. Since Op(x) = (x p — l)/(x — 1), we have 


O p (x + 1) = [(x + 1)^ - l]/x = x p 1 + 




P- 


It’s not true that x n ~ l + 

x n~ 2 _| h x + 1 is 

irreducible when n is not 
prime. For example, when 
„ = 4, x 3 + x 2 + x+ 1 = 
(x+ l)(x 2 + 1). 


Since p is prime, we have p \ (/’) for all i with 0 < i < p (Proposition 2.26); 
hence, the Eisenstein Criterion applies, and O p (x + 1) is irreducible in Q[x], 
By Lemma 6.66, is irreducible in Q[x], ■ 


Further results. Gauss used Theorem 6.68 to prove that a regular 17-gon 
can be constructed with ruler and compass (ancient Greek mathematicians did 
not know this). He also constructed regular 257-gons and 65537-gons. We will 
look at ruler-compass constructions in Chapter 7. 
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Exercises 

6.46 * Let £ = e 2ni ' n be a primitive /?th root of unity. 

(i) Prove, for all n > 1, that 

X n - 1 = (X - l)(x - t)(x - ? 2 ) • • • (x - f"- 1 ), 
and, if n is odd, that 

x n + 1 = (X + 1)(jc + f)(* + f 2 ) • • • (X + f"- 1 ). 

(ii) For numbers a and b , prove that 

a n — b n = (a — b)(a — t)b)(a — ^ 2 b) ■■■{a — 
and, if n is odd, that 

a" + b n = (a + b)(a + ^b)(a + fb) ■ ■ ■ (a + i; n ~ l b). 

Hint. Set x = a /b if b 0. 

6.47 * Let k be a field and ask. Show that, in k[x], 

x n - a" = (x - a)(x n ~ l + x n ~ l a + x n ~ 2 a 2 + . ..a n ~ l x + a n ). 

6.48 If k is a field, ask , and f(x) = c n x n + c n —\x n ~ 1 + ••• + co 6 k[x\, then 
rewrite 

fix)- f (a) = (c n x n +c n - 1 x n ~ 1 H h c 0 ) - (c„a n +c n - 1 a n ~ 1 -4 h c 0 ) 

and use Exercise 6.47 to give another proof of Corollary 6.15. 

6.49 Determine whether the following polynomials are irreducible in Q [a] . 

(i) fix) = 3x 2 — lx — 5. (ii) fix) = 2x 3 — x — 6. 

(iii) fix) = 8.v 3 — 6.\‘ — 1. (iv) fix) = x 2 + 6x 2 + 5x + 25. 

(v) fix) = x 4 + 8.v + 12. 

Hint. In F5 [a], fix) = (x + l)g(.v), where g is irreducible. 

6.50 Use the Eisenstein Criterion to prove that if a is a square free integer, then 
x n — a is irreducible in Q[.v] for every n > 1. Conclude that there are irreducible 
polynomials in Q[.v] of every degree n > 1. 

6.51 * In the proof of Lemma 6.59, we claimed that the first d powers of f are distinct, 
where f is an nth root of unity and d is the smallest positive integer with = 1. 
Prove this claim. 

6.52 * Let f be an nth root of unity. Lemma 6.59 shows that f is a primitive <r/ th root 
of unity for some divisor d of n. Show that the divisor is unique. 

6.53 Consider a finite table of data: 


Input 

Output 

a\ 

b\ 

a 2 

b 2 

« 3 

bz 



a n 

bn 


Show that two polynomial functions (defined over Q) agree on the table if and 
only if their difference is divisible by 

n 

]~ [ (cc — «/). 

1=1 
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6.54 (i) Show that the set of polynomials in Q[x] that vanish on {u \ , . . . , a n } is an 

ideal in Q[x], 

(ii) What is a generator of this ideal? 

6.55 * 

(i) Implement the recursively defined function for (x) given in Example 6.61 
in a CAS. 

(ii) Use it to generate (x) for, say, 1 < n < 50. 

(iii) Use the CAS to find the smallest value of n for which a coefficient of $„ (x) 
is not 0, 1 , or — 1 . 


6.3 Connections: Lagrange Interpolation 


There are several methods 
for fitting polynomial 
functions to data; see 
Chapter 1 of [7], 


A popular activity in high school mathematics is finding a polynomial that 
agrees with a table of data. For example, students are often asked to find a 
polynomial agreeing with a table like this: 


Input 

Output 

-3 

12 

2 

22 

3 

72 

-4 

-26 


On the surface, the problem of fitting data seems to have little to do with the 
ideas in this chapter. However, by placing it in a more abstract setting, we’ll 
see that it yields Lagrange Interpolation, a result useful in its own right. Thus, 
this problem fits right into the theory of commutative rings; in fact, it’s really 
the Chinese Remainder Theorem! But isn’t the Chinese Remainder Theorem 
about solving some congruences? Well, yes, and we’ll see that we can make 
the notion of congruence apply here. But first, let’s find a polynomial /(x) by 
hand that fits the table. 


/(— 3) = 12 O 
/ ( 2 ) = 22 
/ (3) = 72 
/(— 4) = —26 


the remainder when /(x) is divided by (x + 3) is 12 
the remainder when /(x) is divided by (x — 2) is 22 
the remainder when /(x) is divided by (x — 3) is 72 
the remainder when /(x) is divided by (x + 4) is —26. 


The statement about remainders looks structurally similar to the solution we 
constructed to the problem from Qin Jiushao on page 146. We now make this 
similarity precise. 

Recall that a = b mod m in Z means that m \ (a — b). We can define 
congruence in k[x\, where k is a field: given m(x) e k[x\, then / (x), g(x) are 
congruent mod m, denoted by 


/ = g mod m. 


if m | (/ - g). 
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Take It Further. Throwing caution to the winds, here’s a fantastic general- 
ization. Rephrase congruence mod m in Z in terms of ideals: since in \ (a — b) 
if and only if a — b e ( m ), we have a = b mod m if and only if a — b G ( m ). 
Let R be a commutative ring and / be any, not necessarily principal, ideal in R. 
If a, b G R , define 


a = b mod / 

to mean a — b G / (we’ll actually use this generalization when we discuss 
quotient rings in Chapter 7). 


We can now rephrase the Division Algorithm in k [x] using congruence. The 
statement: given m(x), f(x) G k[x\ with m(x) ^ 0, there exist q(x ), r(x) G 
k[x\ with / = qm + r, where r = 0 or deg(r ) < deg(m), can be rewritten to 
say 


f = r mod m . 

And Proposition 6.14, the Remainder Theorem, says that if m(x) = x — a, 
then / = / (a) mod (x — a). Thus, the constraints on / can be rewritten 

/ = 12 mod (x + 3) 
f = 22 mod (x — 2) 

/ = 72 mod (x — 3) 

/ = —26 mod (x + 4). 


Notice that the four linear polynomials are pairwise relatively prime. 

Let’s push the similarity a little further, using the localization idea on page 146. 
Suppose we can find polynomials g, h, k, and l satisfying 


g(r 3) = 1 
g( 2) = 0 
g( 3) = 0 
g(~ 4) = 0 


h(—3) = 0 

h( 2) = 1 

h(3) = 0 
h(—4) = 0 


k(- 3) = 0 
k{ 2) = 0 
k( 3) = 1 
k(—4) = 0 


£(- 3) = 0 
1 ( 2 ) = 0 
1(3) = 0 
U-4) = 1. 


Setting / = 12 g + 22li + 12k — 26 1, we have a polynomial that fits the 
original table (why?). Now Proposition 6.15, the Factor Theorem, shows that 
g is divisible by the linear polynomials x — 2, x — 3, and x + 4. Since they are 
irreducible in Q[x], they are pairwise relatively prime; hence, Exercise 6.17(h) 
on page 247 says that g is divisible by their product: there is A(x) such that 


g(x) = A(x - 2)(x - 3)(x + 4). 

In fact, we can choose A to be a constant and have g(— 3) = 1: set 


1 = g(- 3) = A(—3 + 2) (—3 - 3) (—3 + 4); 
that is, A = 1/30 and g(x) = ^(x — 2)(x — 3)(x + 4). Similarly, 

• h(x) = B(x + 3)(x — 3)(x + 4) and h( 2) = 1 implies B = —1/30, so 

h(x) = ~jq(x + 3)(x - 3)(x + 4) 
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• k(x) = C(x + 3)(x — 2)(x + 4) and k( 3) = 1 implies C = 1/42, so 

k(x) = j^{x + 3)(x - 2)(x + 4) 

• £(x) = D(x + 3)(x — 2)(x — 3) and f(— 4) = 1 implies D = —1/42, so 

Hx) = -^(x + 3)(x - 2)(x - 3). 

Now putting / = I2g+22h + 72k—26l, we have, after simplification (carried 
out by a CAS): 

/ (A) = 2x 3 + 4x 2 — 8x + 6. 

You can check that / matches the table. 

This method is called Lagrange Interpolation, and it applies to any finite 
set of input-output pairs. As we just saw, it’s the same method used in the 
proof of the Chinese Remainder Theorem, but applied to polynomials rather 
than integers. Since our goal is merely to display connections, we leave the 
proof to the reader. 

Theorem 6.69 (Chinese Remainder Theorem for Polynomials). Let k be 

a field. If ni \, . . . , m r e k[x\ are pairw’ise relatively prime and b\, . . . , b r e 
k[x\, then the simultaneous congruences 

f = b\ mod/wi 
/ = £>2 mod m 2 


f = b r mod m r 

have an explicit solution, namely, 

f = b\ (s\Mi) + £>2 (.S2M2) + — + b T ( s r M r ) , 

where 

Mi = m\ni2 ■■■fhi ■■■ m r and Si Mi = 1 mod m; for 1 < i < r. 
Furthermore, any solution to this system is congruent to f mod m\m2- • -m r . 

Proof. The proof of Theorem 4.27 can be easily adapted to prove this. ■ 

Example 6.70. The calculations we have just made can be used to illustrate 
the theorem. The table gives 

b x = 12, b 2 = 22, b 3 = 72, b 4 = -26; 

applying the Remainder Theorem to the table entries, 

m\ = x + 3, m 2 = x —2, m3 = x — 3, m 4 = x + 4. 

Mi = (x — 2)(x — 3)(x + 4), 

M 2 = (x + 3)(x - 3)(x + 4), 

M3 = (x + 3)(x — 2)(x + 4), 

M 4 = (x + 3)(x — 2)(x — 3), 


Then 
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and 


si = 1/30, s 2 = -1/30, s 3 = 1/42, s 4 = -1/42. 


Note, for example, that 


si Mi - 


= ^(.v - 2)(.v - 3)(.v + 4) - 1 

_ x 3 - x 2 - 14x + 24 ^ 

30 

_ x 3 - x 2 - 14x - 6 
“ 30 


= (x + 3) 


x 2 — 4x — 2 
30 


so that Si Mi = 1 mod m\. This is not magic; if you look carefully at how si 
is calculated (we called it A on the previous page), you’ll see that it is none 
other than 


1/M X (— 3). 

And the Remainder Theorem (again) says that the remainder when g(x) = 
M x (x)/M x (—3) is divided by x + 3 is 

g(— 3) = M 1 (-3)/M 1 (-3) = 1. 

Similarly, we have s/M, = 1 mod m, for the other values of i . 

Finally, the statement about any other solution to the system follows from 
Exercise 6.53 on page 269. A 


Compare the statement of Theorem 6.69 to the more typical statement of 
Lagrange Interpolation. 

Theorem 6.71 (Lagrange Interpolation). Let k be a field. An explicit way of 
writing the polynomial f (x) e k [x] of minimal degree that takes the values bj 
at distinct points a\ , for 1 < / < r, is 

u s u Ml ( x ) , , M 2 ( x ) , , m 3(x) , , u M r (x) 

fix) = oi — , s + 02 - — - N + b — - H Yb r 


M r (a r ) 


(ai) M 2 (a 2 ) M 3 (a 3 ) 

where M,- (x) is the polynomial defined by 
Mi (x) = (x - a i )(x - a 2 ) • • • (x - at- i)(x - a,)(x - a i+ 1 ) • • • (x - a r ). 


Some things to note: 

(i) Theorem 6.69 is more general than Theorem 6.7 1 , for it allows the moduli 
mj to be any finite set of relatively prime polynomials (Lagrange Interpo- 
lation only considers moduli of the form x — a). 

(ii) On the other hand, Theorem 6.71 is more explicit: it implies that Si (in 
the statement of the Chinese Remainder Theorem) is 1 /M,(a,) (in the 
statement of Lagrange Interpolation). 

(iii) The statement of Lagrange Interpolation goes on to say that the poly- 
nomial obtained by this method is the one of lowest degree that fits the 
conditions. 

You’ll verify these last two items in Exercises 6.60 and 6.63 below. 
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Exercises 

6.56 Find a polynomial that agrees with the table 


The result of Exercise 6.53 
allows you to fool many 
standardized tests. 


Input 

Output 

0 

3 

1 

4 

2 

7 

3 

48 

4 

211 


6.57 A radio show offered a prize to the first caller who could predict the next term in 
the sequence 

1. 2. 4. 8. 16. 

(i) What would you get if you used “common sense?” 

(ii) What would you get if you used Lagrange Interpolation? 

6.58 Another radio show offered a prize to the first caller who could predict the next 
term in the sequence 

14. 3, 26, 8, 30. 


After no one got it for a few days, the host announced that these are the first 
five numbers that were retired from the Mudville Sluggers baseball team. Use 
Lagrange Interpolation to predict the next number that was retired in Mudville. 

6.59 The following table fits the quadratic f(x) = x 2 — 3x + 5; that is, /( 0) = 5, 
/( 1) = 3, etc. Now forget about / and use Lagrange Interpolation to find a 
polynomial that fits the table. 


Input 

Output 

0 

5 

1 

3 

2 

3 

3 

5 

4 

9 


It seems that this table should fool Lagrange Interpolation, which produces a de- 
gree 4 polynomial. Does it? 

6.60 * Show that Lagrange Interpolation produces a polynomial of smallest degree that 
agrees with a given input-output table. 

6.61 (i) Find a polynomial g(.\') that agrees with the table 


Input 

Output 

4 

24 

5 

60 

6 

120 

7 

210 

8 

336 


(ii) Factor g into irreducibles. 
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6.62 It is known that there’s a cubic polynomial function f(x) e Q[x] such that, for 
positive integers n , 

n — 1 

fin) = fc2 ' 

k = 0 


(i) Find/. 

Hint. A cubic is determined by four inputs. 

(ii) Prove that / ( n ) = 0 ^ or P os rh ve integers n. 

6.63 * Using the notation of Theorem 6.71, show that 

Mj ( x)/Mj (a/) = 1 mod (x - a,-). 





7 Quotients, Fields, and Classical 

Problems 


In Chapter 4, we introduced the idea of congruence modulo an integer m as a 
way to “ignore” multiples of m in calculations by concentrating on remainders. 
This led to an arithmetic of congruence classes and the construction of the 
commutative ring 1 m in which multiples of m are set equal to 0. You now 
know that the multiples of m form an ideal (m) in Z, and so Z m can be thought 
of as a commutative ring obtained from Z in which all the elements of ( m ) are 
set equal to 0. 

In this chapter, we introduce quotient rings, a generalization of this con- 
struction. Given a commutative ring R and an ideal / C fl.we will produce a 
new commutative ring R/ 1 that forces all the elements of / to be 0. 

In particular, beginning with the commutative ring k[x\ and the ideal (/), 
where k is a field and / (x) e k[x\, we shall see that identifying (/) with 0 
produces an element a in the quotient ring k[x\/(f) that is a root of /: if 
fix) = Co + C\x + ■ ■ ■ + c„x n , then /(a) = Co + c\tx -1- • • • + c n a n = 0. 
Moreover, the complex number field is a special case: C is the quotient ring 
arising from R[x] and the ideal (x 2 + 1). Another byproduct of the quotient 
ring construction is the existence and classification of all finite fields (there are 
others beside and F4). 

In the last section, we will apply fields to settle classical geometric prob- 
lems that arose over two millenia ago: using only ruler and compass, can we 
duplicate the cube, trisect an angle, square the circle, or construct regular n- 
gons? 


7.1 Quotient Rings 

In Chapter 3, we said that the approach of many Renaissance mathematicians 
to the newly invented complex numbers was to consider them as polynomi- 
als or rational functions in i , where calculations are carried out as usual with 
the extra simplification rule i 2 = —1. In constructing Z m from Z, we ig- 
nored multiples of m (that is, we set them all equal to 0), and we saw that this 
idea is compatible with addition and multiplication. Let’s see if we can mimic 
this idea, starting with R[x], and apply it to C, as our Renaissance ancestors 
wished. Can we replace the symbol x in a polynomial f(x) by a new sym- 
bol i that satisfies i 2 = —1? Well, if we make x 2 = —1, then we are setting 
x 2 + 1 = 0. So, as with constructing Z m , let’s set all the multiples of x 2 + 1 
in R[x] equal to 0. 

This analogy with Z m looks promising: let the commutative ring R[.v] cor- 
respond to Z, and let the (principal) ideal (x 2 + 1) in M[.v] correspond to 
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the ideal ( m ) in Z. Push this analogy further. Elements in Z m are congruence 
classes [a], where a G Z. Let’s invent new elements [/] corresponding to poly- 
nomials f(x). In more detail, [a] denotes {a + qm : q G Z}, and we forced 
qm = 0 (actually, using the Division Algorithm in Z, this allowed us to focus 
on remainders after dividing by m). Defining [/] = {/(x) + q(x)(x 2 + 1) : 
q(x) G R[x]} would allow us to focus on the remainder after dividing /(x) by 
x 2 + 1. Indeed, the Division Algorithm in R[x] writes 

fix) = q(x)(x 2 + 1) + r(x), 

where r(x ) = 0 or deg(r) < 2. In other words, we could write [/] = [r], 
where r (x ) = a + bx for a, b € R. Hold it! If the bracket notation makes 
x 2 + 1 = 0, then x 2 = — 1, and we may as well write i instead of [x]. Looks a 
lot like C to us! Now it turns out that this idea is also compatible with addition 
and multiplication, as we shall see when we introduce quotient rings precisely. 
The construction makes sense for any commutative ring R and any ideal / 
in R ; moreover, it constructs not only C but many other important systems 
as well. 


Definition. Let / be an ideal in a commutative ring R. We say that a . h e R 
are congruent mod / , written 


if a — b € / . 


a = b mod /, 


This does generalize our earlier definition of congruence when R = Z, 
m > 0, and / = ( m ). If a, b G Z and a = b mod (m), then a — b G ( m ). But 
a — b G (m) if and only if m \ a —b\ that is, a = b mod m in the old sense. 
We now note that congruence mod / is an equivalence relation on R. 

Proposition 7.1. Let a. b. c be elements in a commutative ring R. If I is an 
ideal in R, then 

(i) a = a mod / 

(ii) if a = b mod I , then b = a mod / 

(iii) if a = b mod / and b = c mod / , then a = c mod I . 

Proof. Just modify the proof of Proposition 4.3. ■ 

The next result shows that the new notion of congruence is compatible with 
addition and multiplication of elements in R. 


Proposition 7.2. Let I be an ideal in a commutative ring R. 

(i) If a = a' mod I and b = b' mod I , then 

a + b = a' + b' mod I. 

More generally, if at = a\ mod I for i = 1 , ,k, then 
a i + 1 -ak = a\ -\ + a' k mod I. 

(ii) If a = a' mod I and b = b' mod I, then 

ab = a'b' mod I. 

More generally, if at = a\ mod I for i = 1 , ,k, then 
a\ ■■■cik = a\ ■■■a' k mod I. 
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(iii) If a = b mod /, then 

a k = mod / for all /: > 1 . 

Proof. This is a straightforward modification of the proof of Proposition 4.5. 
For example, here is the proof of Proposition 4.5(i). If m \ (a — a') and 
m | (b— b'), thenm | (a+b)—(a'+b r ), because (a+b)—(a'+b r ) = (a—a')+ 
( b — b'). Rewrite this here by changing “m \ (a — o')” to “a — a' e /.” ■ 

We now mimic the construction of the commutative rings Z m by first gen- 
eralizing the idea of a congruence class. 

Definition. Let I be an ideal in a commutative ring R. If a e R, then the coset 
a + I is the subset 

ci 1 = {a + z : z £ / } c R . 

Thus, the coset a + I is the set of all those elements in R that are congru- 
ent to a mod / . Cosets generalize the notion of congruence class and so, by 
analogy, the coset a + I is often called a mod / . 

Proposition 7.3. If R = Z, I = (m ), and a e Z, then the coset 
a + I = a + (in) = {a + km : k e Z} 
is equal to the congruence class [o] = {n e Z : n = a mod m}. 

Proof. If u e a + (m), then u = a + km for some k e Z. Hence, u— a = km, 

m | (u — a), u = a mod m, and u e [a]; that is, a + (m) c [a]. 

For the reverse inclusion, if v € [a], then v = a mod m, m | ( v — a), 

v — a = Im for some I e Z, and v = a + £m e a 4- (»;). Therefore, 

[a] c a 4- (m), and so a + ( m ) = [a]. ■ 

In Proposition 7.1, we saw that congruence mod / is an equivalence rela- 
tion on R\ in Exercise 7.6 on page 285, you will prove that if a e R, then its 
equivalence class is the coset a + I . It follows that the family of all cosets is 
a partition of R (see Proposition A. 17); that is, cosets are nonempty, R is the 
union of the cosets, and distinct cosets are disjoint: if a + I f b + /, then 
(a + /) Q (b + /) = 0. 

When are two cosets mod / the same? In Proposition 4.2, we answered this 
question by proving that a = b mod in if and only if each of a and b has the 
same remainder after dividing by m. 

Proposition 7.4. Let I be an ideal in a commutative ring R. If a, b e R, then 
a + I = b + I if and only if a = b mod I . In particular, a + I = I if and 
only if a e I . 

Proof. Note first that a e a + I , for 0 e / and a = a + 0. If a + I = b + I , 
then a e b + /; hence, a = b + i for some i e /, and so a — b e / and 
a = b mod I . 

Conversely, assume that a —b € I ; say a — b = / . To see whether a + I c 
b + I , we must show that if a + i ' e a + I , where V e / , then a + i ' e b + I . 
But a + i' = {b + i ) + i ' = b + (i + if e b + / (for ideals are closed under 
addition). The reverse inclusion, b + I c a + /, is proved similarly. Therefore, 
a + I = b + I. m 
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We have now generalized congruence mod in to congruence mod an ideal 
and congruence classes to cosets. The next step is to assemble the cosets and 
make a commutative ring with them. 

Definition. If / is an ideal in a commutative ring R , we denote the set of all 
its cosets by R/ 1: 

R / 1 = {ci -p I : a £ R} . 

Once the set Z m was built, we equipped it with the structure of a commu- 
tative ring by defining addition and multiplication of congruence classes. We 
carry out that program now for R // . 

Definition. Let I be an ideal in a commutative ring R . 

Define addition a: R/ 1 x R/ 1 — > R/ 1 by 

a: (a + I , b + I) m- a + b + I 

and define multiplication fi:R/IxR/I — > R/ 1 by 

/i:(a + I,b + /) h ab + I. 

Example7.5. Suppose that R = Z[x] and / is the principal ideal (x 2 + x+ 1). 
If a = 3 + 2x and b = 4 + 3x, then 

{ci + I)(b + /) = ab + I = (3 -p 2x)(4 + 3x) -F / = 12 -F 1 lx -F 6x^ -F /. 

But, by Exercise 7.4(ii) on page 285, 12 + 1 lx + 6x 2 = 6 + llx mod / (in 
fact, (12 + 17x + 6x 2 ) — (6 + llx) = 6(x 2 + x + 1)), so that 

(3 + 2x + /)(4+ 3x + /) = 6+ llx-F /. ▲ 

Lemma 7.6. Addition and multiplication R/ 1 xR/ 1 — » R/ 1 are well-defined 
functions. 

Proof. Let a + I = a' + I and b + I = b' + I ; that is, a — a 1 e I and 

b-V e /. 

To see that addition is well-defined, we must show that a' + b' + I = 
a + b + I . This is true: 

(a + b) — ( a ' + b') = (a — a') + (b — b') e I. 

To see that multiplication R / 1 x R/ 1 —> R/ 1 is well-defined, we must 
show that ( a ' + I){b' + I) = a'b' + I = ab + I; that is, ab — a'b' e I . But 
this is true: 

ab — a'b' = ab — a'b + a'b — a'b' = (a — a')b + a'(b — b') el. ■ 

The proof of Theorem 4.32, which shows that Z m is a commutative ring, 
generalizes to show that R/ 1 is a commutative ring. Here are the details. 

Theorem 7.7. If I is an ideal in a commutative ring R, then R/ 1 is a com- 
mutative ring. 
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Proof. Each of the eight axioms in the definition of commutative ring must be 
verified; all the proofs are routine, for they are inherited from the correspond- 
ing property in R. If a, ft, c e R. then we have 

(i) Commutativity of addition: 

{a + I) + (b + I) = a + b + I = ft-t-a-t-7 = (ft-|-7)-(-(a-(-7). 

(ii) The zero element is 7 = 0 + I , for I + (a + I) = 0 + a + I= a + I. 

(iii) The negative of a + 7 is —a + 7, for ( a + 7) + (—a + I) = 0 + 7 = 7. 

(iv) Associativity of addition: 

[(a + /) + (ft + /)] + (c + /) = (a + b + I) + (c + I) 

= [(a + ft) + c] + 7 = [a + (ft + c)] + 7 
= (a + 7) + (ft + c + 7) = (a + I) + [(ft + /) + (c + /)]. 

(v) Commutativity of multiplication: 

(< a + I)(b + I) = ab + I = ba + I = (b + I)(a + I). 

(vi) The multiplicative identity is 1 + 7, for (1 + 7) (a + 7) = 1 a + I =a + I. 

(vii) Associativity of multiplication: 

[(tf + /)(ft + /)](c + /)= (ab + I){c + /) 

= [(ab)c\ + I = [a(bc)\ + I 
= (a + 7)(ftc + I) = (a - 1- /)[(ft + I)(c + /)]. 

(viii) Distributivity: 

(a + /)[(ft + /) + (c + /)] = (a + 7)(ft + c + I) 

= [a(b + c)] + 7 = (ab + ac) + I 
= (ab + I) + (ac + 7) 

= (a + 7)(ft + 7) + (a + I)(c + 7). ■ 

Definition. The commutative ring R/ 1 constructed in Theorem 7.7 is called 
the quotient ring of R modulo 7 (it is usually pronounced " R mod 7”). 

We said that quotient rings generalize the construction of h m . Let’s show 
that the commutative rings Z / (m) and Z m are not merely isomorphic, they are 
identical. 

We have already seen, in Proposition 7.3, that they have the same elements: 
for every a G Z, the coset a + (m) and the congruence class [a] are subsets 
of Z, and they are equal. But the operations coincide as well. They have the 
same addition: 

(a + (m)) + (ft + (m)) = a + ft + (m) = [a + ft] = [a] + [ft] 

and they have the same multiplication: 

(i a + (m))(b + (m)) = ab + (m) = [ab] = [a] [ft]. 

Thus, quotient rings truly generalize the integers mod m. 

If 7 = /?, then R/ 1 consists of only one coset, and so R / 1 is the zero ring 
(in Chapter 4, we said that the zero ring does arise occasionally). Since the 
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The word “map" is often 
used as a synonym for 
function or homomor- 
phism. 


zero ring is not very interesting, we usually assume, when forming quotient 
rings, that ideals are proper ideals. Recall, in constructing Z m , that we usually 
assumed that m > 2. 

The definitions of addition and multiplication in R/I involve an interplay 
between reducing modulo the ideal / and the operations of addition and multi- 
plication in R/ 1 . In the special case of Z m , we called this interplay “reduce as 
you go.” But that’s just an informal way of describing a homomorphism. More 
precisely, if we define a function jt: R — > R/ 1 by jt: a !->• a + I , then we can 
rewrite a + b + I = (a + /) + (b + / ) as jt(a + b) = Jt(a) + Jt(b): similarly, 
ab + I = (a + I )(b + I ) can be rewritten as jt(ab) = Jt(a)jt(b). 

Definition. If / is an ideal in a commutative ring R , then the natural map is 
the function jt: R —> R/ 1 given by 

a i — y a + I; 


that is, Jt(a) = a + I . 


Proposition 7.8. If I is an ideal in a commutative ring R, then the natural 
map n : R —y R / 1 is a surjective homomorphism, and ker jt = I . 

Proof. We have just seen that jt(a+b) = Jt(a)+Jt(b) m\Ajt(ab) = jt(a)jt(b). 
Since tt (I ) = 1 + /, the multiplicative identity in R/ 1 , we see that tt is a ho- 
momorphism. 

Now jt is surjective: if a + I e R/I , then a + I = Jt(a). Finally, by 
definition, kerjr = {a € R \ Jt(a) = 0 + /}. But jt(a) = a + /, and 
a + I = 0 + / if and only if a & I (Proposition 7.4). The result follows. ■ 

Here is the converse of Proposition 5.25: Every ideal is the kernel of some 
homomorphism. 

Corollary 7.9. Given an ideal I in a commutative ring R, there exists a com- 
mutative ring A and a homomorphism rp: R — > A with I = ker < p. 

Proof If we set A = R/I , then the natural map jt: R ^ R/I i s a homomor- 
phism with / = ker jt . ■ 

We know that isomorphic commutative rings are essentially the same, being 
“translations” of one another; that is, if rp: R — > S is an isomorphism, we may 
think of r e R as being in English while (p(r) e S is in French. The next 
theorem shows that quotient rings are essentially images of homomorphisms. 
It also shows how to modify a homomorphism to make it an isomorphism. 


There are second and third 
isomorphism theorems, 
but they are less useful 
(see Exercise 7.15 on 
page 286). 


Theorem 7.10 (First Isomorphism Theorem). Let R and A be commutative 
rings. If (p: R — > A is a homomorphism, then ker rp is an ideal in R, im rp is a 
subring of A, and 


R / ker rp = mup. 

Proof. Let / = ker rp. We have already seen, in Proposition 5.25, that / is an 
ideal in R and 'mup is a subring of A. 
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Define <p\ R/ 1 —> im <p by 

fp(r + I) = <p{r). 

We claim that^I is an isomorphism. First, Tp is well-defined. If r + I = s + I, 
then r — s e I = ker cp, cp(r — s) = 0, and < p{r) = cp(s). Hence 

7p(r + I) = <p(r ) = <p{s) = 'cpis + I). 

Next, Ip is a homomorphism because < p is. 

lp((r + I) + (s + /)) = 7p{r + s + I) 

= cp(r + s) = (p{r) + < p(s) 

= 7p(r + /) + Ip {s + I). 

Similarly, ^((r + I)(s + /)) = Tp{r + Ij(pis + /) (Exercise 7.7 on page 285). 
As^f(l + I) = <p(l) = 1, we see that^ a homomorphism. 

We show that^is surjective. If a e im (p, then there is r G R with a = i p{r)\ 
plainly, a = (p{r) = 7p(r + /). 

Finally, we show that^ is injective. If^f(r + I) = 0, then < p{r) = 0, and 
r e kerip = I . Hence, r + I = /; that is, ker^f = {/} and^f is injective, by 
Proposition 5.31. Therefore, Ip is an isomorphism. ■ 

We can illustrate this last proof with a picture; such a picture is often called 
a commutative diagram if composites of maps having same domain and same 
target are equal. Here, i : im<p — > A is the inclusion, and < p = upm . 


\ X 

R/ 1 — ^-9- ivtup 

<p 

Here’s a trivial example. If R is a commutative ring, then ( 0 ) is an ideal. 
The identity 1r . R — > R is a surjective homomorphism with ker 1 r = ( 0 ), so 
that the First Isomorphism Theorem gives the isomorphism 1 r: R/( 0) — > R\ 
that is, R/( 0) = R. 

Theorem 7.10 has more interesting applications than showing that R/{ 0 ) = R. 
For example, it gives us the tools needed to tighten up the discussion of the al- 
ternate construction of C that began this section. 


Theorem 7.11. The quotient ring ]R[,t]/(x 2 + 1) is a field isomorphic to the 
complex numbers C. 

Proof. Consider the evaluation tp: R[x] — ► C (as in Corollary 5.21) with 
cp(x) = i and (p(a ) = a for all a € R; that is, 

<p: f (x) = a o + a\x + ci2X 2 + •••!-> / (i) = a o + a\i + 02 i 2 + • • • . 

Now ( p is surjective, for a + ib = cp(a + bx), and so the First Isomorphism 
Theorem gives an isomorphism^: M[x]/ ker cp — > C, namely / (x) + ker <p i->- 
/(/). But Corollary 6.26 gives ker (p = {x 2 + 1); therefore, R[x]/(x 2 + 1) = C 
as commutative rings, by the First Isomorphism Theorem. We know that C is 
a field, and any commutative ring isomorphic to a field must, itself, be a field. 
Thus, the quotient ring R[x]/(x 2 + 1) is another construction of C. ■ 


Hence, the high school 
approach to complex 
numbers contains the 
germ of a correct idea. 
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How to Think About It. Because every element of R[x] is congruent to a 
linear polynomial a + bx mod (x 2 + 1), every element ofR[x]/(x 2 + 1) can 
be written as a + bx + (x 2 + 1) for some real numbers a and b. 


Example 7.12. (i) Since R[x]/ (x 2 + 1) is a field, every nonzero element in 

it has a multiplicative inverse. Let’s find the inverse of an element a + 
bx + (x 2 + 1) by “pulling back” the formula in C, 

1 a — bi 

a + bi a 2 + b 2 ’ 

to R[x]/ (x 2 + 1), using the inverse of the isomorphism (p in the proof of 
Theorem 7.1 1. Now cp^ 1 (a + bi) = a + bx + ( x 2 + 1), so that 

1 a - bx , 2 

i ; — t 77 — “7 TT + ( x + 1). 

a + bx + (x 2 + 1) a 2 + b 2 


(ii) Euclidean Algorithm II gives another way of finding the inverse, writing 
gcd(a + bx, x 2 + 1) as a linear combination of a + bx and x 2 + 1. The 
algorithms in Exercise 6.18(iii) produce the linear combination in R[x]: 


+ ^ x ) + 1(* 2 + 1 ) = 


a 2 + b 2 
b 2 


Dividing both sides by (a 2 + b 2 )/b 2 , we have 


a — bx , , . b 2 2 

(a + bx) + — r -r- (x 2 + 1) = 1. 


a 2 + b 2 


+ b 2 


Moving to R[x]/(v 2 + 1), this implies again that 


(a + bx + (x 2 + 1)^ = 


1 a — bx 


+ b 2 


+ (x 2 + 1) A 


We end this section with a generalization of Theorem 7.11. If you chase 
back the arguments to their source, you’ll see that all we needed is that R[x] is 
a PID and x 2 + 1 is an irreducible element in R[x]. 

Proposition 7.13. If R is a PID and p is an irreducible element in R, then 
R/(p) is afield. 

Proof It suffices to show that every nonzero element a + (p) in the commu- 
tative ring R/(p) has a multiplicative inverse. Since a + ( p) f 0, we have 
a f. (p); that is, p \ a. Since R is a PID, Theorem 6.46 says that gcd’s exist 
and are linear combinations. In particular, gcd(a, p) = 1, so there are ,v. t e R 
with sa + tp = 1 . Thus, 

1 + ip) = sa + (p) = (sa + (p)'j = (s + ( p )) (a + (p)J 
in R/{p), and (a + [p)J = s + ( p ). Therefore, R/{p) is a field. ■ 
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Exercises 


7.1 Are any cosets of (5) in Z ideals? 

7.2 Prove Proposition 7.1. 

7.3 Prove Proposition 7.2. 


i * In 

i Q[a]/ (x 2 + a + 1), write i 

each term in the form a + bx with a, b e Q. 

(i) 

(3 + 2x)(4 + 3 a) 

(ii) 

12 + 17x + 6x 2 

(iii) 

A 2 

(iv) 

x 3 

(v) 

(1-A) 2 

(vi) 

(1-a)(1-a 2 ) 

(vii) 

( a + bx)(a + bx 2 ) 

(viii) 

(a + bx) 2 . 


7.5 In Q[.v]/ (.v 4 + .v 3 + .v 2 + x +1), write each term in the form a + bx + cx 2 + dx 2 
with a,b,c,d rational numbers. 

(i) A' 5 

(ii) (1 - a)(1 - x 2 )(l - a 3 )(1 - .v 4 ) 

(iii) (1 + a)(1 + a 2 )(1 + a 3 )(1 + a 4 ). 

7.6 * In Proposition 7.1, we saw that if I is an ideal in a commutative ring R, then 
congruence mod I is an equivalence relation on R. Prove that the equivalence 
classes are the cosets mod I . 

7.7 In the notation of Theorem 7.10, show that 

7p((r + I)(s + /)) = /p(r + I)lp(s + I). 

7.8 * Let tp: R —*■ S be an isomorphism of commutative rings. Assume that / C R 
and / C S are ideals and that <p(I) = J . where <p(I) = {i p(a ) : a E I}. Prove 
that <p\ R/ 1 -*■ S/ J . given by <p: r + I i->- i p(r) + /, is an isomorphism. 

7.9 Let I be an ideal in a commutative ring R. 

(i) If S is a subring of R and IQS, prove that 

S/I = {r + I : r e 5} 

is a subring of R/ 1 . 

(ii) If J is an ideal in R and I C /, prove that 

J/I = {r + I :r e J} 

is an ideal in R/ 1 . 

7.10 Show that the subring Z[x]/(a 2 + 1) of R[x]/(a 2 + 1) is isomorphic to the 
Gaussian integers Z[i], 

7.11 Show that there is an isomorphism of fields: 

R[a]/(x 2 + 1) ^ R[a]/(a 2 + a + 1) 

Hint. Both are isomorphic to C. 

7.12 Show that 

Q[a]/(a 2 + a + 1) ^ Q[&>] = {u + va) : u, v € Q}, 
where co = j ^—1 + i n/3^. 

7.13 Show that the subring Z[.v]/ (.v 2 + a + 1) of R[a]/ (a 2 + a + 1) is isomorphic to 
the Eisenstein integers Z [co]. 
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7.14 For each element of Q[x]/ ( x 2 + x + 1), find the multiplicative inverse. 

(i) 3 + 2x + ( V 2 -\- x + 1) 

(ii) 5 — x + ( x 2 + x + 1) 

(iii) 15 + lx — 2x 2 + (x 2 + x + 1) 

(iv) a + bx + (.v 2 + x + 1) (in terms of a and b). 

7.15 * Prove the Third Isomorphism Theorem : If R is a commutative ring having 
ideals / C J , then J / 1 is an ideal in R/I , and there is an isomorphism 

(R/I)/(J/I)^R/J. 

Hint. Show that the function i p: R/I —*■ R/J, given by a + / I-*- a + 7, is a 
homomorphism, and apply the First Isomorphism Theorem. 

7.16 For every commutative ring R, prove that R[x\/ (x) = R. 

7.17 An ideal / in a commutative ring R is called a prime ideal if I is a proper ideal 
such that ab 6 I implies a e I or b 6 I . 

(i) If p is a prime number, prove that (p) is a prime ideal in Z. 

Hint. Euclid’s Lemma. 

(ii) Prove that if an ideal ( m ) in Z is a prime ideal, then m = 0 or \m\ is a prime 
number. 

7.18 Let I be a proper ideal in k[x], where k is a field. 

(i) If p is an irreducible polynomial, prove that (p) is a prime ideal in k [.v] . 

(ii) Prove that if an ideal (/) in k[x] is a prime ideal, then / = 0 or / is an 
irreducible polynomial. 

7.19 Let I be a proper ideal in a commutative ring R. 

(i) Prove that (0) is a prime ideal in R if and only if R is a domain. 

(ii) Prove that / is a prime ideal if and only if a / I and b £ I imply ab / I . 

(iii) Prove that / is a prime ideal if and only if R/I is a domain. 

7.20 Prove that (x) is a prime ideal in Z[.v], 

Hint. Is Z[.v]/ (x) a domain? 

7.21 An ideal I in a commutative ring R is called a maximal ideal if I is a proper ideal 
for which there is no proper ideal J with I C J . 

(i) If p is a prime number, prove that (p) is a maximal ideal in Z. 

(ii) Prove that if an ideal (m) in Z is a maximal ideal, then \m\ is a prime number. 

7.22 Let I be a proper ideal in k[x], where k is a field. 

(i) If p is an irreducible polynomial, prove that (p) is a maximal ideal in k[x], 

(ii) Prove that if an ideal (/) in k[x\ is a maximal ideal, then / is an irreducible 
polynomial. 

7.23 * Let I be a proper ideal in a commutative ring R. 

(i) Prove that (0) is a maximal ideal in R if and only if R is a field. 

(ii) Prove that I is a maximal ideal if and only if R/I is a field. Conclude that if 
k is a field and p(x) e k[x] is irreducible, then k[x\/ (p) is a field. 

(iii) Prove that every maximal ideal is a prime ideal. 

7.24 (i) Prove that J is a maximal ideal in Z[.v], where J consists of all polynomials 

with even constant term. 

Hint. Prove that Z[x]/7 = F 2 . 

(ii) Prove that the prime ideal (x) in Z[x] is not a maximal ideal. 
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7.2 Field Theory 

General results about quotient rings R/I have a special character when R 
enjoys extra hypotheses. In this section, we investigate properties of fields with 
an eye to using the ideas behind the isomorphism 

R[v]/(x 2 + 1) s C. 

We are going to apply quotient rings to prove some interesting results: for every 
polynomial / (x) G k [x], where k is a field, there exists a field extension E / k 
containing all the roots of f ; we will also be able to prove the existence of 
finite fields other than Fp. 

Characteristics 

Contemplating “any field” seems quite daunting, and so it makes sense for us to 
begin classifying fields. First of all, fields come in two types: those that contain 
a subfield isomorphic to Q, and those that contain a subfield isomorphic to Fp 
for some prime p. 

Recall the definition of na on page 160, where n G Z and a is an element 
of a commutative ring R. For example, 3 a means a + a + a and (—3 )a means 
—a — a — a. More generally, if n is a nonnegative integer, then na means 


a -F a ~F • • • ~F ci , 

n times 

0 a = 0, and —na is the sum of \n | copies of —a. Note that n a is a hybrid in the 
sense that it is the product of an integer and an element of R , not the product of 
two ring elements. However, n a can be viewed as the product of two elements 
in R, for if e is the multiplicative identity in R, then ne G R and na = ( ne)a . 
In particular, 3 a = a + a + a = (e + e + e)a = (3 e)a. This “action” of Z 
on R is really a homomorphism. 

Lemma 7.14. If R is a commutative ring with multiplicative identity e, then 
the function /'■ Z — > R, given by 


X(n ) = ne, 


is a homomorphism. 

Proof Exercise 7.25 on page 293. ■ 

Proposition 7.15. Ifk is a field and Z k is the map y. n m- ne, where 
e is the multiplicative identity in R, then either irn / ^ Z or im/ = F p for 
some prime p. 

Proof Since every ideal in Z is principal, kcr y = (in) for some integer 
m > 0. If m = 0, then / is an injection, and ini / = Z. If m 0, the First 
Isomorphism Theorem gives Z m = Z/(m) = im/ C T Since k is a field, 
im/ is a domain, and so m is prime (Exercise 5.3 on page 195). Writing p 
instead of m, we have im / = Z p = F p . ■ 

Corollary 7.16. Every fieldk contains a subfield isomorphic to either Q or Fp. 
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Proof. Proposition 7.15 shows that k contains a subring isomorphic to Z or 
to for some prime p. If the subring is Z, then, because the field k contains 
multiplicative inverses for all of its non-zero elements, it contains an isomor- 
phic copy of Q = Frac(Z). More precisely, Exercise 5.38(ii) on page 212 says 
that a field containg an isomorphic copy of Z as a subring must contain an 
isomorphic copy of Q. ■ 

By Exercise 7.28(i) on page 293, k can’t contain an isomorphic copy of 
both Q and F p ; by Exercise 7.28(ii) on page 293, k can’t contain copies of Fp 
and F 9 for distinct primes p and q. 

Definition. A field has characteristic 0 if ker / = (0); it has characteristic p 
if ker/ = (p) for some prime p. 

This distinction is the first step in classifying different types of fields. 

The fields <Q>, R, C, and C(x) have characteristic 0, as do any of their sub- 
fields. Every finite field has characteristic p for some prime p (after all, if 
ker / = (0), then im/ = Z is infinite); ¥ p (x), the field of all rational func- 
tions over Fp, is an infinite field of characteristic p. 

Proposition 7.17. Let k be a field of characteristic p > 0. 

(i) pa = 0 for all a € k. 

(ii) Ifq = p n , then ( a + b) q = a q + b q for all a,b e k. 

(iii) Ifk is finite, then (p'.k-^-k, given by 

tp: a m- a p , 


Exercise 5.36 on page 212 
proves a congruence 
version of (ii) for a, b e Z. 


is an isomorphism. 

Proof, (i) Since k has characteristic p, we have ker(/) = (/?); that is, /(/?) = 
p 1 = 0 (we have reverted to our usual notation, so that 1 denotes the mul- 
tiplicative identity). But the hybrid product pa can be viewed as a product 
of two ring elements: pa = ( p\)a = 0 a = 0 . 

(ii) Expand ( a + b) p by the Binomial Theorem, and note that p \ ( p ) for 
all 1 < j < p — 1 . By (i), all the inside terms vanish. The argument is 
completed by induction on n > 1 . 

(iii) It is obvious that <p (\ ) = 1 and 

< p(ab) = ( ab) p = a p b p = <p(a)<p(b). 

By (ii), i p(a + b) = (p(a) + tp(b). Therefore, <p is a homomorphism. Since 
ker tp is a proper ideal in k (for 1 f. ker <p), we have ker <p = ( 0 ), because k 
is a field, and so (p is an injection. Finally, since k is finite, the Pigeonhole 
Principle applies, and (p is an isomorphism. ■ 

We have seen finite fields F /; with p elements, for every prime p, and in 
Exercise 4.55 on page 165, we saw a field F 4 with exactly four elements. The 
next result shows that the number of elements in a finite field must be a prime 
power; there is no field having exactly 15 elements. Theorem 7.38 will show, 
for every prime p and every integer n > 1 , that there exists a field having 
exactly p" elements. 
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Example A.20(iv) in the Appendices shows that if a commutative ring R 
contains a subring k that is a field, then R is a vector space over k: vectors are 
elements r e R, while scalar multiplication by a e k is the given multiplica- 
tion ar of elements in R. The vector space axioms are just some of the axioms 
in the definition of commutative ring. 

Proposition 7.18. If K is a finite field, then \K\ = p n for some prime p and 
some n > 1. 


Proof. The prime field of K is isomorphic to ¥ p for some prime p, by Propo- 
sition 7.15. As we remarked, K is a vector space over F p ; as K is finite, it is 
obviously finite-dimensional. If dimF„(Al) = n, then |AT| = p n , by Corol- 
lary A. 34 in the Appendix. ■ 


If AT is a vector space 
over k, its dimension is 
denoted by dim^(i^) or, 
more briefly, by dim(.K'). 


Extension Fields 

The Fundamental Theorem of Algebra states that every nonconstant polyno- 
mial in C [a] is a product of linear polynomials in C [x] ; that is, C contains all 
the roots of every polynomial in C [x]. Using ideas similar to those allowing us 
to view C as a quotient ring, we’ll prove Kronecker’s Theorem, a local ana- 
log of the Fundamental Theorem of Algebra for polynomials over an arbitrary 
field k: given fix) G k[x], there is some field E containing k as a subfield that 
also contains all the roots of /. (We call this a local analog, for even though 
the larger field E contains all the roots of the polynomial /, it may not contain 
roots of some other polynomials in k [a].) In fact, we’ll see how to construct 
such an E making basic use of quotient rings of the form k[x\/I, where k is a 
field. 

Proposition 7.19. If k is afield and I = if), where fix) e k[x ] is noncon- 
stant, then the following are equivalent : 

(i) / is irreducible 

(ii) k[x]/ 1 is afield 

(iii) k\x\/ 1 is a domain. 


Theorem 4.43 says that 
Z m is a field if and only 
if m is a prime in Z; 
Proposition 7.19 is the 
analog for k[x]. 


Proof, (i) => (ii) Since k[x\ is a PID, this follows at once from Proposi- 
tion 7.13. 

(ii) => (iii) Every field is a domain. 

(iii) => (i) Assume that k[x]/I is a domain. If / is not irreducible, then 
there are g(x), h(x) e k[x\ with / = gh, where deg(g) < deg (/) and 
deg(/i) < deg (/). Recall that the zero in k[x\/ 1 is 0 +1 = 1. Thus, if 
g + I = /, then g € I = if) and / | g, contradicting deg (g) < deg (/). 
Similarly, h + I f I . However, the product (g + /)(/; + /) = / + 1 = I 
is zero in the quotient ring, which contradicts k[x\/ 1 being a domain. 
Therefore, / is irreducible. ■ 

The structure of general quotient rings R/ 1 can be complicated, but for 
special choices of R and /, the commutative ring R / 1 can be easily described. 
For example, when k is a field and pix) € k[x] is an irreducible polynomial, 
the following proposition gives a complete description of the field R/ 1 = 
k[x\/ ip), and it shows how to build a field K in which pix) has a root. 


This section will be using 
various facts about dimen- 
sion, and you may wish 
to look in Appendix A. 3 to 
refresh your memory. 




290 Chapter 7 Quotients, Fields, and Classical Problems 


If we view k as a subfield 
of K, then it makes sense 
to speak of a root of in 
K. 


Proposition 7.20. Let k be a field and K = k\x]/ (p), where p(x) e k[x ] is a 
monic irreducible polynomial of degree d and I = ( p ). 

(i) K is a field, and 

k == {a -P / ! a €= k} 

is a subfield of K isomorphic to k. Ifk ' is identified with k via a a + I , 
then k is a subfield of K. 

(ii) z = x + I is a root of p in K. 

(iii) If g(x) € k\x\ and z is a root of g in K, then p \ g in k[x\. 

(iv) p is the unique monic irreducible polynomial in k [a] having z as a root. 

(v) K is a vector space over k, the list 1, z, z 2 , . . . , z^ -1 is a basis, and 

dimfc(X) = d. 


Proof, (i) Since p is irreducible, Proposition 7.19 says that the quotient ring 
K = k[x]/I is a field, while Corollary 5.32 on page 220 says that the 
natural map a i->- a + I restricts to an isomorphism k — > k' . 

(ii) Let p(x) = ao + a\x + ■ ■ ■ + ad-\x d ~ x + x d , where a, e k for all i. 
In light of the identification of k and k' in (i), we may view p(x) as 

( aj + I)x J . Hence, since z = x + I , 

p(z) = (c/o + I) + (pi + I) z + • • • + (1 + I)z d 

= (a o + I) + ( [a\ + I)(x + /) + •••+ (1 + 7)(x + I) d 
= ( ciq + /) + (ci\x + /) + •••+ (x d + I) 

= ao + a.\x + • • • + x d -p / 

= /> + / = /, 

because / = (p). But / = 0 + / is the zero element of K = k[x]/I\ 
thus, p(z) = 0 and z is a root of p. 

(iii) If p j g in k\x\, then gcd(g, p) = 1 because p is irreducible. Therefore, 
there ares,f e k[x] with 1 = sp(x) + tg(x). SinceA[x] C 7T[x],wemay 
regard this as an equation in K[x\. Setting x = z gives the contradiction 
1 = 0. 

(iv) Let h(x) e k [x] be a monic irreducible polynomial having z as a root. By 
part (iii), we have p \ h. Since h is irreducible, we have It = cp for some 
constant c ; since It and p are monic, we have c = 1 and h = p. 

(v) Example A. 20 in the Appendices shows that K is a vector space over k. 
Every element of K has the form /(x) + /, where / e k[x]. By the 
Division Algorithm, there are polynomials q, r € k[x\ with / = qp + r 
and either r = 0 or deg(r) < d = deg (p). Since / — r = qp e I , it 

follows that /+/ = r + /.Letr(x) = bo + biX-\ \-bd-\X d ~ l , where 

bj € k for all i. As in (ii), we see that r + / = bo + b\z-\ \-bd-\z d ~ x . 

Therefore, 1, z, z 2 , . . . , z d ~ x spans K. 

To see that the list is linearly independent, suppose that 

d - 1 

y, dz l =0 in K = k[x]/(p)\ 
i = 0 
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lifting to k [x], this says that 
d - 1 

^ cyx' = 0 mod (/?) in k[x], 
i = 0 

so that p | ^f'o 1 c,x ! in k[x]. But deg(/?) = d, so that all c, =0. ■ 

Definition. If K is a field containing k as a subfield, then K is called an exten- 
sion field of k, and we write “ K / k is an extension field.” An extension field 
K/ k is a finite extension if K is a finite-dimensional vector space over k. The 
dimension of K , denoted by 

[K:k], 

is called the degree of K/ k. 

Corollary 7.21. If k(z)/k is a field extension, where z is a root of an irre- 
ducible polynomial p(x) € k[x\, then 

[kfz) : k] = deg (p). 

Proof. This is just a restatement of Proposition 7. 20(v). ■ 

Corollary 7.21 shows why [K : k] is called the degree of K/ k. 


How to Think About It. At first glance, many people see Proposition 7.20 
as a cheat: we cook up a field that contains a root of p(x) by reducing mod p. 
The root is thus a coset , not a “number.” But, just as mathematicians gradually 
came to see the elements of C as numbers (through their constant use in calcu- 
lations), one can develop a feel for arithmetic in k[x]/ (p) in which the cosets 
become concrete objects in their own right, as in the next example. 


Example 7.22. Suppose that k = Q and 

p(x) = x 3 + x 2 — 2x — 1. 


You can check that p is irreducible (it’s a cubic without a rational root (why?), 
so it can’t factor). Now 


K = Q[x]/(x 3 + x 2 — 2x — 1) 
is a field, [K : Q] = 3, and a basis for K over <Q> is 

1 + {p), x + ( p ), x 2 + (p). 

Hence, every element of K can be represented by a quadratic expression 

a + bx + cx 2 + ( p ), 

where a,b,c G Q. The expression is unique, because p is a cubic: if two 
quadratics fig & Q[x] are congruent mod (p), then either f — g = 0 or 
deg (/ — g) <2; the latter cannot occur, and so / = g. So, let’s drop the 
“+(/>)” decoration and just represent an element of K by the unique quadratic 


This notation should not be 
confused with the notation 
for a quotient ring, for a 
field K has no interesting 
ideals: in particular, if 
k c K, then k is not an 
ideal in K. 


Eventually, we dropped 
the bracket notation for 
congruence classes, 
abbreviating [a] to a. 
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polynomial in its congruence class mod p. Using this convention, the elements 
of K are thus named by quadratic polynomials in <Q[x], 

What about the arithmetic? Just as in C = R[x]/(x 2 + 1), calculations in K 
are carried out by calculating in <Q[x], dividing by p, and taking the remainder. 
Indeed, because x 3 + x 2 — 2x — 1 = 0 in K, we have an equation in K, 

x 3 = —x 2 + 2x + 1. 

Hence, to calculate in K, we calculate in Q[x] with the additional simplifica- 
tion rule that x 3 is replaced by — x 2 + 2x + 1. Beginning to sound familiar? 
So, for example, here are some calculations in K: 

(i) Addition looks just the same as in Q[x] because addition doesn’t increase 
degree: 

(a + bx + cx 2 ) + (d + ex + /x 2 ) 

= (a + d) + (b + e)x + (c + f)x 2 . 

(ii) Multiplication requires a simplification. For example, in K , 

(3 + 2x + 4x 2 )(— 1 + 5x + lx 2 ) = 3 + 53x + 77x 2 , 

a fact that you can verify (by hand or CAS) by expanding the left-hand 
side and reducing mod (p). 

In general, expand 

(i a + bx + cx 2 ) (d + ex + /x 2 ) 


as 


CAS environments will do 
all of this work for you— just 
ask for the remainder when 
a product is divided by p. 


cfx A +{bf + ce)x 3 + (cif + be + cd)x 2 + ( ae + bd)x + ad, 
and then simplify, replacing occurrences of x 3 by — x 2 + 2x + 1: 

= cfx (x 3 ) + (bf + ce)x 3 + (af + be + cd)x 2 + ( ae + bd)x + ad 
= cfx (—x 2 + 2x + 1 ) + (bf + ce) (— x 2 + 2x + l) 

+ (af + be + cd)x 2 + ( ae + bd)x + ad 
= etc. 


A little practice with such calculations gives you the feeling that you are indeed 
working with “numbers” in a system and, if K had any use, you’d soon become 
very much at home in it just as our Renaissance predecessors became at home 
in C. ▲ 


You could use Cardano’s 
formula to find expressions 
for the roots of p{x). Why 
not try it? 


While we’ve constructed a field extension K/ k in which p{x) e k[x] has 
a root a, we have little idea about what that root is, even when k = R and a 
is a complex number. For example, if p(x) = x 2 + 1, we can’t tell whether 
a = i or a = —i . Proposition 7.20 doesn’t give you a way to find roots — 
it just gives you a way to construct an extension field containing k and in 
which the operations behave as if p(x) is 0. Playing with these operations 
might actually give you some ideas about the three complex numbers that make 
p(x) = x 3 + x 2 — 2x — 1 equal to 0 in C; see Example 7.22 above and 
Exercises 7.29 and 7.30 below. 
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Exercises 

7.25 * Prove Lemma 7.14. 

7.26 * If X is a subset of a field k, then (A), the subfield generated by X , is the 
intersection of all the subfields containing X (by Exercise 4.61(iii) on page 168, 
the intersection of any family of subfields of k is itself a subfield of k). 

(i) Prove that (A) is the smallest such subfield in the sense that any subfield F 
containing X must contain (A). 

(ii) Define the prime field of a field k to be the intersection of all the subfields 
of k. Prove that the prime field of k is the subfield generated by 1. 

(iii) Prove that the prime field of a field is isomorphic to either Q or F p . 

7.27 * If k is a field of characteristic p > 0 and a e k, prove that 

( x + a) p = x p + a p . 

Hint. Use Proposition 7.17 and the Binomial Theorem. 

7.28 Let R be a commutative ring, and let p, q be distinct primes. 

(i) Prove that R cannot have subrings A and B with A = Q and B = F p . 

(ii) Prove that R cannot have subrings A and B with A = F p and B = F ? . 

(iii) Why doesn’t the existence of R = F p x F ? contradict part (ii)? (Exercise 5.54 
on page 221 defines the direct product of rings.) 

7.29 * As in Example 7.22, let p(x) = x 3 + x 2 — 2x — 1 and let K = Q[.\']/ ( p). Let 

a = x, ft = x 2 — 2, and y = x 3 — 3x = —x 2 — x + 1. 

Calculate in K , writing each result as a + bx + cx 2 : 

(i) « + P + Y 

(ii) af} + ay + fiy 

(iii) afiy. 

7.30 * As in Example 7.22, let p(x) = x 3 + x 2 — 2.v — 1 and let K = Q[.v]/(/>). 
Show that, in K, 

p(x 2 — 2) = p(x 3 — 3.y) = 0. 

Hence the three roots of p in K are x + (/>), x 2 — 2 + ip), and .v 3 — 3.v + ip). 

Algebraic Extensions 

The first step in classifying fields is by their characteristics. Here is the second 
step: we define algebraic extensions. 

Definition. Let K/k be an extension field. An element z € K is algebraic 
over k if there is a nonzero polynomial / (x) € k [a] having z as a root; other- 
wise, z is transcendental over k. A field extension K/k is algebraic if every 
z € K is algebraic over k. 

When a real number is called transcendental, it usually means that it is 
transcendental over Q. For example, it was proved by Lindemann that 7t is 
a transcendental number (see [15], pp. 47-57 or [3], p. 5); there is no nonzero 
f(x) € Q[x] with f (jt ) = 0. 
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Proposition 7.23. If K/ k is a finite extension field, then K/ k is an algebraic 
extension. 

Proof. By definition, K/ k finite means that K has finite dimension n as a 
vector space over k. Suppose that z is an element of K. By Corollary A. 39 in 
Appendix A. 3, the list of n + 1 vectors 1, z, z 2 , . . . , z" is linearly dependent: 
there are Co, Ci, . . . , c„ G k, not all 0, with c,z' = 0. Thus, the polynomial 
fix) = ff ax' is not the zero polynomial, and z is a root of /. Therefore, z 
is algebraic over k. ■ 

The converse of the last proposition is not true; the field A of all complex 
numbers algebraic over Q is an algebraic extension of Q that is not a finite 
extension. (The fact that A is a field is not obvious, but it is true (see [17], 
Chapter 6).) 

Definition. If K/ k is an extension field and z e K, then k(z), the subfield of 
K obtained by adjoining z to k, is the intersection of all those subfields of K 
containing k and z. 

More generally, if A is a subset of K , define k(A ) to be the intersection of 
all the subfields of K containing /r U A; we call k ( A ) the subfield of K obtained 
by adjoining A to k. In particular, if A = {z\ , . . . , z„} is a finite subset, then 
we may denote k(A) by k{z \ , . . . , z n ). 

In Exercise 7.43 on page 308, you’ll show that k(A) is the smallest subfield 
of K containing k and A; that is, if E is any subfield of K containing k and A, 
then k(A) C E. 

Proposition 7.20 starts with an irreducible polynomial p(x) € k[x] and 
constructs an extension K/ k in which p has a root. Suppose we start with the 
root; that is, suppose that z is algebraic over k. Can we find a polynomial p 
so that k(z) (the smallest extension of k that contains z) can be realized as 
k[x\/ (/>)? Let’s look at an example. 

Example 7.24. Suppose that K = R, k = Q, and z = *J2 + \/3. First of 
all, z is algebraic over <Q>. To see this, proceed as you would in high school 
algebra. 

z 2 = 5 + 2^6, 

(z 2 - 5) 2 = 24, 
z 4 - 10z 2 +1=0. 

Hence, z is a root of h(x) = x 4 — I0x 2 + 1, so it is algebraic over <Q>. 

Consider the evaluation homomorphism \fr. Q[v] — > R (provided by Theo- 
rem 5.19) given by 


fix) i-» /(z). 

The First Isomorphism Theorem suggests that we look at im i// and ken//. 

• im \fr contains Q (because \[r (a) = a for all a e Q) and z (because ^(x) = z). 
It follows that any subfield of R that contains Q and z contains im \fr. In other 
words. 


im ifr = Q(z). 
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• If I = ker f, then the First Isomorphism Theorem gives 

Q[jc]/ 7 = im \jf. 

• im \[r is a subring of R, so it is a domain. And I is an ideal in <Q)[.y], so it is 
principal, say I = ( p ), where p(x) G Q[x]. 

• Furthermore, since Q[x]/ / is a domain, p is an irreducible polynomial in 
Q[x], which we can take to be monic. 

Thus, we have an isomorphism: 

'T: Q[a']/(t>) = im^, 

namely tk: f(x) + ( p) m- /(z). Since im^ = Q(z), we have 

QM/(p) = QO). 

There it is: Q(z) is realized as a quotient of Q[.v] by an irreducible polynomial. 
And, because we started with a specific z, we can do better: we can find p. 
We’ll see later, in Example 7.32, that p(x) = h{x). ▲ 

Example 7.24 contains most of the ideas of the general result. 

Theorem 7.25. (i) If K/ k is an extension field and z e K is algebraic 

over k, then there is a unique monic irreducible polynomial p(x) e k[x] 
having z as a root. Moreover, if I = (p), then k[x\/ 1 = k(z)\ indeed, 
there exists an isomorphism 

T* : k[x]/ 1 — > k(z) 

with Trfx + I) = z and Tr(c + I) = c for all c G k. 

(ii) If z' G K is another root of p(x), then there is an isomorphism 

6 : k(z) — > k(z ' ) 

with 6(z) = z' and 6(c) = c for all c G k. 

Proof, (i) As in Example 7.24, consider the evaluation homomorphism 
\fr. k[x\ — > K , given by Theorem 5.19: 

f- f i-» /(^). 

Now im f is the subring of K consisting of all polynomials in z, that is, 
all elements of the form /(z) with / G k\x\, while ker^ is the ideal 
in k[x] consisting of all those g(x) G k[x\ having z as a root. Since 
every ideal in k[x] is a principal ideal, we have ker^ = (p) for some 
monic polynomial p (x ) G k[x]. Hut the First Isomorphism Theorem says 
that k[x\/(p) = im\f, which is a domain, and so p is irreducible, by 
Proposition 7.19. The same proposition says that k[x]/(p) is a field, and 
so there is an isomorphism tk: k [. x\/(p ) = im f ; namely 'k: / (x) + I i->- 
f(f) = f(z). Hence, im'k is a subfield of K containing k and z. But 
every such subfield of K must contain imf, so that im'k = im^ = 
k(z). We have proved everything in the statement except the uniqueness 
of p; but this follows from Proposition 7.20(iv). 
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(ii) By (i), there are isomorphisms 'i:k[x\/I — > k(z ) and 'ii'\k[x]/I — > 
k(z') with'I'(c + 7) = c = v I /, (c + 7 ) for all c e k\ moreover, Th x+ 1 h-> 
z and rlr': x + I i-»- z' . The composite 8 = '!>' o tp -1 is the desired 
isomorphism; it satisfies 8(c) = c for all c G k, and 

0: f(z ) i-> /(z')> 

for all / (x) e k [x] . ■ 

The proof of Theorem 7.25(ii) is described by the following diagram. 
k[x]/(p) - > fc(z) 

^(Z') 

Definition. If 717/ k is an extension field and z e 77 is algebraic over k, then 
the unique monic irreducible polynomial p(x) e k[x\ having z as a root is 
called the minimal polynomial of z over k, and it is denoted by 

p(x) = irr(z, k). 

The minimal polynomial irr (z,k) depends on k. For example, irr(7, E) = 
x 2 + 1, while irr(/, C) = x — i. 

Example 7.26. We know that i e C is algebraic over E, and irr(/,E) = 
x 2 + 1. Now —i is another root of x 2 + 1. The isomorphism 8: C = E(7) — > 
E(— i) = C with (9(7) = —i and 8(c) = c for all c e E is, of course, complex 
conjugation. 

The adjunction of one root of irr(/, E) also adjoins the other root, for the 
minimal polynomial here is quadratic. But this doesn’t always happen. For 
example, z = ^5 is algebraic over Q with minimal polynomial x 3 — 5. The- 
orem 7.25 tells us that 



'P: QM/(x 3 - 5) Q(z), 

given by f(x) + (x 3 - 5) i-> f(i/S), is an isomorphism. But the roots of 
x 3 — 5 are not all contained in Q(z); indeed, the other two roots are not real, 
while Q[z] C E; in fact, in C[x], 

x 3 — 5 = (x — z)(x — coz)(x — co 2 z), 

where to is our old friend j 1 + i \/3^. Theorem 7.25(i) tells us that the 
fields 


Q(z), Q(coz), Q(co 2 z) 

are all isomorphic via isomorphisms that fix Q pointwise. One of these is 

8 : Q(z) Q(coz), 

defined as follows; every element of Q(z) is of the form f(z) where f(x) + 
(x 3 — 5) is a coset in Q[x]/ (x 3 — 5). Then 


8 (f(z)) = f(wz). 
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Again, a diagram illustrates the work just done. 

QM/(X 3 — 5) + Q(z) 

e = >po4'-l 

Q(ft>z) A 

The following formula is quite useful, especially when proving a theorem 
by induction on degree. Before reading the proof, you may want to refresh your 
memory by looking at Appendix A. 3 on linear algebra. 

Theorem 7.27. Let k C E C K be fields, with E / k and K/ E finite extension 
fields. Then K / k is a finite extension field, and 

[K : k] = [K : E][E : k]. 

Proof. If A = ai, . . . , a n is a basis of E over k and B = b \ , . . . , b m is a basis 
of K over E, then it suffices to prove that the list X of all a\bj is a basis of K 
over k . 

To see that X spans K , take u e K. Since B is a basis of K over E, there 
are scalars A j e E with u = £A A jbj. Since A is a basis of E over k, there 
are scalars [iji e k with Ay = £A Therefore, u = JA- /x/;«,T>y, and X 

spans K over k . (Check that this makes sense in the special case A = a 1 , 02,03 
and B = b\, ^ 2 -) 

To prove that X is linearly independent over k, assume that there are scalars 
fiji e k with J2ij l l ji a ibj = 0. If we define A j = £A jXjjOi, then Ay e E 
and X j bj = 0. Since B is linearly independent over E, it follows that 

0 = A j = 'y ' j i j j 0 j 

i 

for all j . Since A is linearly independent over k, it follows that jt ji = 0 for all 
j and i , as desired. ■ 

There are several classical geometric problems, such as trisecting an arbi- 
trary angle with ruler and compass in which Theorem 7.27 plays a critical role 
(see Section 7.3). 

Example 7.28. We now show how Theorem 7.27, the multiplicativity of de- 
gree in a tower of extension fields, can be used to calculate degrees; we also 
show, given an extension field E / k, that an explicit basis of E over k can 
sometimes be constructed. We urge you to work though this example care- 
fully; it will help make the preceding development much more concrete, and 
you will see how all these ideas come together. 

Let’s return to Exercise 3.56 on page 116 (if you haven’t attempted this 
exercise, you should try it now). It involves £ = cos(27r/7) + i sin(27r/7), a 
primitive 7th root of unity; note that the powers of £ are the vertices of a regular 
7-gon in the complex plane. Using Proposition 6.62 and the language we have 
since introduced, we can now say that 

irr(C, Q) = <J> 7 (x) = x 6 + x 5 + x 4 + x 3 + x 2 + 1, 
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This “tower of fields” is 
sometimes illustrated with 
a diagram that displays the 
degrees: 


Qtt) 



Q 


because ^(x) is irreducible (Theorem 6.68), and so [Q(£) : Q] = 6. For any 
nonnegative integer k, we have 

(V) _1 = F = £ 7 -*, 


by Theorem 3.32(ii). We defined a , /5, and y in Exercise 3.56 on page 1 16 by 

a = £ + £ 6 = 2cos(27t/7) 

P = l; 2 + t; 5 = 2cos(47r/7) 
y = £ 3 + £ 4 = 2cos(67r/7), 


and we saw that 

a + /l + y = -1 
a/l + ay + /5y = —2 
a/ly = 1. 


It follows that a, /3, and y are roots of 

x 3 + x 2 — 2x — 1. 

Ah, but this is precisely the irreducible p(x) in Example 7.22. There, you 
constructed a field in which p has a root, but you didn’t know what the roots 
are. Now you know: they are a, ft, and y, all real numbers, determined by 
expressions involving cosines. Furthermore, the construction in Theorem 7.25 
gives a field isomorphic to Q(o!). But Q(a) contains all the roots of p\ for 
example, 

« 2 =(£ + ? 6 ) 2 
= £ 2 + 2£ 7 + £ 12 

= £ 2 + 2 + £ s 

= ft + 2, 

and, hence, 

ft = a 2 - 2 € Q(a). 

In the same way, you can expand a 3 to see that 

y = a 3 + 3a, 

so that y is also an element of Q(a). 

Since a = £ + £~ 1 , we see that Q(a) is a subfield of Q(£). And, since 

[Q(a) : Q] = deg p = 3, 

Theorem 7.27 gives 

[Q(£) : Q(«)][Q(«) : Q] = [Qtt) : Q]. 

Hence, [Q(£) : Q(a)] -3 = 6, and [Q(£) : Q(a)] = 2. Therefore, the exten- 
sion Q(£)/Q decomposes into a cubic extension of Q followed by a quadratic 
extension of Q(a). This implies that 

deg(irr(£,Q(a))) = 2; 
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that is, £ is a root of a quadratic polynomial with coefficients in Q(a). Finding 
this quadratic is deceptively easy. We have 

£ + £ 6 = a and ££ 6 = £ 7 = 1. 

Thus, £ is a root of 

x 2 -in+le Q(a)[x], 

There’s another tower, Q C E C Q(£), with £/Q a quadratic extension 
and Q (£)/E a cubic extension (i.e., writing 6 = 2-3 rather than 6 = 3-2). 
We constructed a, /5, y by breaking up the roots of <I >7 into three sums, each 
a pair of complex conjugates. Instead, let’s try to break the roots up into two 
sums, say 8 and e, each having three terms, and each sum containing just one 
member of every conjugate pair {£ y , £ y = £ 7 ~ y }; further, we’d like both 8 + e 
and Se rational. A little experimenting (and a CAS) leads to defining them like 
this: 


5 = f + £ 2 + £ 4 
€ = £ 6 + £ S + £ 3 - 

Note that e + 8 = — 1, so that e g Q(<5). We can now form the tower 


The roots of this polynomial 
are, by the quadratic 
formula, \ ± See 
Exercise 7.35 on page 300. 

Hence, [Q(<5) : Q] = 2 and [Q(£) : Q(<5)] = 3. We now have two ways to 
decompose the extension Q(£)/Q, which we draw in the following diagram: 

m) 


Q(a) 


Q(«) 


Q 

Finally, £ must be a root of a cubic polynomial with coefficients in Q(<5). 
Again, the calculations are deceptively simple: 

K + 1 2 + 1 4 = s, 

U 2 + U 4 + £ 2 £ 4 = t 3 + t 5 + t 6 = e, 
ft 2 ? 4 = £ 7 = 1, 

so that £, t, 2 , and £ 4 are roots of 

x 3 — 8x 2 + ex — 1 e Q(<5)[x]. ▲ 






You can also check that € 8 = 2, so that e and 8 are the roots of the quadratic 
polynomial in Q[,v]: 

x 2 + x + 2. 
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Exercises 

7.31 As usual, let = cos(2 n/n) + i sin(2jr /»). 

(i) Find the minimal polynomial of over Q for all n between 1 and 10. 

(ii) What is minimal polynomial of / p over Q if p is prime? 

(iii) What is minimal polynomial of % p 2 over Q if p is prime? 

7.32 If p is a prime, and % p = cos(2 n/ p) + i sm(2n/ p), show that 

[Q(fr):Q] = />-!• 

7.33 Show that x 2 — 3 is irreducible in Q ( V5) [x] . 

7.34 Let k C K C £ be fields. Prove that if £ is a finite extension of k, then £ is a 
finite extension of K , and £ is a finite extension of k. 

7.35 Show that 

(i) cos(27r/7) + cos(47r/7) + cos(8jr/7) = —j. 

(ii) ( sin(27r/7) + sin(4jr/7) + sin(8jr/7))“ = j. 

7.36 Let k C £ C K be a tower of fields, and let z e K. Prove that if k(z)/k is finite, 
then [£(z) : £] < [ k(z ) : k]. Conclude that [£(z) : £] is finite. 

Hint. Use Proposition 7.20 to obtain an irreducible polynomial p(x) 6 k[x]\ the 
polynomial p may factor in K[x\. 

7.37 Let K/k be an extension field. If A C K and u e k(A), prove that there are only 

finitely many ai ci n e A with m e k(a\ a n ). 

7.38 Let E/k be a field extension. If v e £ is algebraic over k, prove that u -1 is 
algebraic over k. 

Splitting Fields 

We now prove a result of Kronecker that says that if / (x) G k [x] is not con- 
stant, where k is a field, then there is some extension field K/k containing all 
the roots of /. 

Theorem 7.29 (Kronecker). If k is afield and /(x) e k[x\ is nonconstant, 
there exists an extension field K/k with f a product of linear polynomials 
in K[x\. 

Proof The proof is by induction on deg (/) > 1. If deg (/) = 1, then / is 
linear and we can take K = k. If deg (/) > 1, write / = pg in k[x\, where 
p(x) is irreducible. Now Proposition 7.20 provides an extension field F/k 
containing a root z of p. Hence, p = (x — z)h, and so / = pg = (x — z)hg 
in F[x\. By induction (since deg(/tg) < deg (/)), there is an extension field 
K/F (so that K/k is also an extension field) with hg, and hence /, a product 
of linear factors in K[x\. ■ 


How to Think About It. For the familiar fields Q, E, and C, Kronecker’s 
Theorem offers nothing new. The Fundamental Theorem of Algebra says that 
every nonconstant /(x) e C[x] has a root in C; it follows, by induction on 
the degree of /, that all the roots of / lie in C; that is, / (x) = a(x — zf) ■ ■ ■ 
(x — z„ ), where a G C and Zj G C for all j . On the other hand, if k = IF /; or 
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k = C (a) = Frac(C [a]), the Fundamental Theorem does not apply. However, 
Kronecker’s Theorem does apply to tell us, for any f (a) G k [x], that there is 
always some larger field K containing all the roots of / ; for example, there 
is an extension field K/C(x) containing «Jx, and there is an extension field 
E /F 3 containing the roots of x 2 — x — 1 € F 3 [a]. 

A field F is called algebraically closed if every nonconstant polynomial 
/(a) e F\x\ has a root in F (for example, C is algebraically closed). In 
contrast, extension fields K/k constructed in Kronecker’s Theorem (that give 
roots of only one polynomial at a time) are usually not algebraically closed. 
Every field k does have an algebraic closure : there is an algebraic extension 
F / k that is algebraically closed (Kronecker’s Theorem is one ingredient of the 
proof; see [25], p. 328). 


The extension field K/k in Kronecker’s Theorem need not be unique. In- 
deed, if / is a product of linear factors in K , then it is so in any extension of 
K. Therefore, let’s consider the “smallest” field in which / is a product of lin- 
ear factors. But the lack of uniqueness is not necessarily a consequence of K 
being too large, as we shall see in Example 7.33. 

Definition. If K/ k is an extension field and / (a ) g k [a] is nonconstant, then 
f splits over K if fix) = a(x — z f) - ■ ■ (a — z„), where zi ..... z„ are in K 
and a G k. 

An extension field E / k is called a splitting field of f over k if / splits over 
E, but / does not split over any extension field F / k such that k C F C E. 

Consider /(a) = a 2 + 1 g Q[a]. The roots of / are ±i, and so / splits 
over C; that is, /(a) = (a — /)( a + /) is a product of linear polynomials in 
C [a]. However, C is not a splitting field of / over Q: there are proper subfields 
of C containing Q and all the roots of /. For example, Q(i ) is such a subfield; 
in fact, it is the splitting field of / over Q. 

A splitting field of a polynomial g( x) G k[x\ depends on k as well as on g. 
A splitting field of a 2 + 1 over Q is Q(/), while a splitting field of a 2 + 1 over 
E is R(i ) = C. 

Corollary 7.30. If k is a field and /(a) g k[x\, then a splitting field of f 
over k exists. 

Proof By Kronecker’s Theorem, there is an extension field K/k such that 
/ splits in K[a]; say /(a) = a(x — zi)---(a — z n ). The subfield E = 
k(z 1 , . . . , z„) of K is a splitting field of / over k, because a proper subfield of 
E must omit some z, . ■ 

Example 7.31. (i) Let /(a) = a" — 1 G Q[a], and let E/Q be a splitting 

field. If / = e l7Z '^ n is a primitive nth root of unity, then Q(£) = E is 
a splitting field of f , for every nth root of unity is a power of t, and 
t/i G <Q>(£) for all j . 

(ii) There are n distinct nth roots of unity in C, but there may be fewer roots 
of unity over fields of characteristic p. For example, let /(a) = a 3 — 1 G 
^3 [a]. Since a 3 — 1 = (a — l) 3 , by Exercise 7.27 on page 293, we see 
that there is only one cube root of unity here. A 
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Note that x 2 = w, 
where w is a root of 
iu 2 — 10tt> + 1; that is, 
id = \ (lO ± 796) = 
5 ± 2 ^/ 6 . 


F 



Q 


How to Think About It. When we defined the field k(A) obtained from a 
field k by adjoining a set A, we assumed that A C K for some extension field 
K/ k. But suppose no larger field K is given at the outset. For example, can 
the roots of f(x) = x 2 — x — 1 G F 3 [a:] be adjoined to F3? Yes. In light 
of Kronecker’s Theorem, there is some field extension K/ F 3 containing the 
roots of /, say a, now we do have the larger field, and so Fs(a, /}) makes 
sense; we can adjoin the roots of / to F3. Such an extension field K may not 
be unique, but we shall see that any two of them are isomorphic. 


Example 7.32. Let’s return to Example 7.24, where we saw that 

z = V2+V3 

is a root of h{x) = x 4 — I0x 2 + 1, and that <Q>(\/2 + V3) can be realized 
as a quotient Q[x]/(p) for some irreducible monic polynomial p(x) € <Q[x] 
having z as a root. 

As promised, we can now do better: we can show that p = h and that 
E = <Q>(\/2 + V3) is a splitting field of h{x) = x 4 — I0x 2 + 1, as well as a 
splitting field of g(x) = ( x 2 — 2)(x 2 — 3). 

Because x 4 — I0x 2 + 1 is a quadratic in x 2 , we can apply the quadratic 
formula to see that if w is any root of h, then w 2 = 5 ± 2\[b. But the identity 
(V« + VF)~ = a + 2\fab + b gives w = ±(V2+ V3). Similarly, 5 — 2^6 = 
(y/2- V3) 2 ', so that h has distinct roots, namely 

z = V 2+V3, -V2-V3, V2-V3, -V2+V3. 

By Theorem 6.52, the only possible rational roots of h are ± 1, and so we have 
just proved that all these roots are irrational. 

We claim that h is irreducible in Q[.v] (so, p = h after all). It suffices to 
show that h has no quadratic factor q(x) e Q[x] (why?). If, on the contrary, 
h = qq' for two monic quadratic polynomials in <Q[x], then the roots of h are 
paired up, two for q and two for q' . Suppose q{z ) = 0. Then the other root of 
q , call it z' , is one of 


V2-V3, -V2-V3, -V2+V3. 

Now.if^(x) = x 2 + fex+c,then— b = z + z'andc = zz' . But you can check, 
for each choice of z' , that either z + z' or zz' is irrational. Since q e Q[x], this 
is a contradiction, and so h is irreducible. 

We now know that [E : Q] = 4. Let F = Q(V2, V3), so that we have a 
tower of fields Qcfcf. Theorem 7.27 tells us that 

[F:Q]=[F:E][E:Q]. 


On the other hand, 


[F:Q]=[F: Q(V2)][Q(V2) : Q]. 

Now [Q(V2) : Q] = 2, because V2 is a root of the irreducible quadratic 
x 2 — 2 in Q[x]. We claim that [F : Q(V2)] < 2. The field F arises by ad- 
joining V3 to Q(\/2); either V3 G Q(V2), in which case the degree is 1, 
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or x 2 — 3 is irreducible in Q(\/2) [x], in which case the degree is 2 (by Exer- 
cise 7.33 on page 300, it is 2). It follows that [F : Q] < 4, and so the equation 
[F : Q] = [F : E][E : Q] gives [F : E] = 1; that is, F = E, so that F not 
only arises from Q by adjoining all the roots of h, but it also arises from Q by 
adjoining all the roots of g(x) = ( x 2 — 2)(x 2 — 3). k 

A splitting field of f(x) G k[x\ is a smallest field extension E/k containing 
all the roots of /. We say “a” splitting field instead of “the” splitting field 
because splitting fields of / over k are not unique. Corollary 7.30 constructed 
a splitting field of fix) e k[x\ as a subfield of a field extension K/ k, where 
/ splits in K. But there may be distinct such field extensions K/ k. 

Example 7.33. Consider f (x) = x 2 + x + le F 2 [x], Now / is irreducible 
(for it is a quadratic with no roots in F 2 ), and / is a product of linear poly- 
nomials in K = F 2 [x]/ (/), by Proposition 7.20: if z = x + (/) G K , 
then f(x) = (x — z) 2 in K[x\ (remember that —1 = 1 here). On the other 
hand, in Exercise 4.55 on page 165, we constructed a field K' with elements 
the four matrices [j* (where a,b G F 2 ) and operations matrix addition 
and matrix multiplication. You can check that if u = [ | ] , then u G K' and 

/ (x) = (x — u) 2 in K'[x\. k 

Our next goal is to show that splitting fields are unique up to isomorphism. 
We paraphrase Theorem 7.25(ii). 

Let K/k be an extension field, and let z,z' G K be roots of some ir- 
reducible p{x) G k[x\. Then there is an isomorphism 9:k(z ) — > k(z') 
with 9(z) = z' and 9(c) = c for all c G k. 

We need a generalization. Suppose that f(x) G k[x] is a polynomial, not nec- 
essarily irreducible, and let E = k(z \ , . . . , z t ) and E' = k(z[ , . . . , z'f) be 
splitting fields of /. Is there an isomorphism 9: E — > E' that carries the roots 
z\, ... ,z t to the roots zf , z' t and that fixes all the elements c G kl The 
obvious way to proceed is by induction on deg (/) (making use of the fact that 
/ has an irreducible factor, which will let us use Theorem 7.25). Think about 
proving the inductive step. We’ll have an isomorphism tp: k(z\) —> k(z [ ) that 
we’ll want to extend to an isomorphism yf: k(z \ , z 2 ) — > k(z [ , z' ) for some j ; 
that is, 1p:k(zi)(z2) — > /c(z' 1 )(z'- ). The base fields k(z i) and k(z[) are no 
longer equal; they are only isomorphic. The upshot is that we have to compli- 
cate the statement of what we are going to prove in order to take account of 
this. 

First, recall Corollary 5.22: 

If R and S are commutative rings and tp: R — > S is a homomorphism, 
then there is a unique homomorphism tp*\ F[.r] — > S [x] given by 

<P*-ro + r\x + r 2 x 2 m- cp(r 0 ) + (p(ri)x + cp(r 2 )x 2 -\ . 

Moreover, tp* is an isomorphism if tp is. 


As we said, we are forced to complicate our earlier result. 
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Lemma 7.34. Letip'.k — > k' be an isomorphism of fields and cp*:k[x] — > k'[x\ 
the isomorphism of Corollary 5.22; let p(x ) G A: [a] be irreducible , and let 

p' = cp*(p). 

(i) p' is irreducible ink' [x], and the map <b:k[x]/(p) — » k'[x\/{p'), defined 

by O: / + {p) i-»- + ( p ' ), is an isomorphism of fields. 

(ii) Let K/k be afield extension , let z e K be algebraic over k, and let 
p(x) = irr (z,k). If p' = <p*(p) G Ar'fcc] z' is a roof of p' in some 
extension ofk', then ip can be extended to an isomorphism k(z) — » k'(z') 
that maps z to z'. 


Proof, (i) This is straightforward, for < p carries the ideal (p) in k [x] onto the 
ideal ( p ') in k'[x\, and Exercise 7.8 on page 285 applies. Exercise 7.44 
on page 308 asks you to give the details. 

(ii) By (i), there are isomorphisms 

if/ : k[x]/(p) —> k(z) and if/' : k'[x\/(p') — >• k'{z'). 

By Lemma 7.34, there is an isomorphism 

$ : k[x]/(p) ->• k'(x)/(p '), 

and the composite ifr' o$o ifr -1 is the desired isomorphism. ■ 

Here is a picture of the Lemma’s proof. 


k[x]/(p) 


k'[x]/(p') 


k(z) 


l/f'o<pO\Jf 


■ k'(z') 


We now give the version we need. 


Theorem 7.35. Let cp:k — > k ’ be an isomorphism of fields and cp*:k[x] — > 
k’[x] the isomorphism k\x\/{p) —> k'[x]/(p') in Lemma 7.34. Let f(x ) e 
k[x] and f*(x) = (p*(f) e k'[x\. If E is a splitting field of f over k and 
E' is a splitting field of f* over k' , then there is an isomorphism <1 
extending < p. 


Proof. The proof is by induction on d = [E : k].lf d = 1, then / is a product 
of linear polynomials in k [x], and it follows easily that /* is also a product of 
linear polynomials in k'[x\. Therefore, E' = k' . and we may set <1> = ip. 

For the inductive step, choose a root z of / in E that is not in k, and let 
p(x) = irr (z, k ) be the minimal polynomial of z over k. Now deg (p) > 1, 
because z f k\ moreover, [A:(z) : k\ = deg (p), by Proposition 7.20. Let z' be 
a root of p* in E' , so that p* = irr(z', k'). 

By Lemma 7.34(ii), there is an isomorphism^: k(z) — > k'(z ') extending <p 
with^(z) = z'. We may regard / as a polynomial with coefficients in k(z), 
for k C k(z) implies k[x\ c k(z)[x\. We claim that is is a splitting field of / 
over k(z)\ that is, 


E = k(z)(zi, . . . ,z„), 
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where z\, . . . ,z n are the roots of f (x)/ (x — z). After all, 

E = k(z, z i,...,z n ) = k(z){z\ ,..., z n ). 

Similarly, E' is a splitting field of /* over k'(z'). But [E : k(z)\ < [E : k\, by 
Theorem 7.27, so the inductive hypothesis gives an isomorphism <1 >: E —>■ E' 
that extends 7p and, hence, cp. ■ 

Corollary 7.36. If k is a field and f(x) € k[x\, then any two splitting fields 
of f over k are isomorphic via an isomorphism that fixes k pointwise. 

Proof Let E and E' be splitting fields of fix) over k. If tp is the identity, then 
Theorem 7.35 applies at once. ■ 

Classification of Finite Fields 

We know, thanks to Proposition 7.18, that every finite field has p n elements for 
some prime p and integer n > I . We also know that a field k with p n elements 
must have characteristic p, so that pa = 0 for all a e k, by Proposition 7.17. 
In this section, we show that fields with exactly p n elements exist, and that any 
two having the same number of elements are isomorphic. 

First, we show that every nonzero element a in a finite field with q elements 
is a (<y — l)st root of unity (of course, a is not a complex root of unity). We 
have seen the idea of the next proof in the proof of Theorem 4.63. 

Lemma 7.37. Let k be a finite field having q elements. If a € k is nonzero, 
then a q ~ l = 1. 

Proof Let k # = U<\ ■ « 2 , .... ci q - 1 } be the nonzero elements of k. We claim, 
for any a e k # , that the function pt a \ cti >->■ aa\ takes values in k # : since k is a 
field, it is a domain, and so aa, 0. We now claim p a :k # — > k # is injective: 
if aaj = aaj , then the cancellation law gives a, = cij . Finally, since k # is 
finite, the Pigeonhole Principle shows that p a is a bijection. It follows that 
aa i , a« 2 . .... aa q -\ is just a rearrangement of a \ . U 2 , . . . , a q - \ . Hence, 

a\U 2 - ■ ■ a q -\ = (aa\)(aaf) ■ ■ ■ (aa q -i) = a q ~ x a\a 2 ---a q -\. 

Now cancel a\U 2 ■ ■ -a q - \ to obtain 1 = a q ~ x . ■ 

We now show, given a prime power q = p n , that there exists a field with 
p n elements. Our guess is that Galois realized that C can be constructed by 
adjoining a root of x 2 + 1 to R, so that it was natural for him (but not for 
any of his contemporaries!) to adjoin a root of a polynomial to F p . However, 
Kronecker’s Theorem was not proved until a half century after Galois’s death. 

Theorem 7.38 (Galois). If p is prime and n is a positive integer, then there 
exists afield having exactly p n elements. 

Proof. Write q = p n . In light of Lemma 7.37, it is natural to consider roots of 
the polynomial 


g(x) = x q x € F p [x], 
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By Kronecker’s Theorem, there is a field extension K/W p with g a product of 
linear factors in K[x). Define 

E = {Z e K : g(z) = 0}; 

that is, E is the set of all the roots of g. We claim that all the roots of g are 
distinct. Since the derivative gfx) = qx q ~ x — 1 = p n x q ~ l — 1 = — 1 (by 
Proposition 7.17), we have gcd(g, g') = 1. By Exercise 6.40 on page 263, all 
the roots of g are, indeed, distinct; that is, E has exactly q = p n elements. 

The theorem will follow if £ is a subfield of K. Of course, 1 e E. If a, 
b e E, then a q = a and b q = b. Hence, (ab) q = a q b q = ab , and ab e E. 
By Proposition 7.17, (a + b) q = a q + b q = a + b, so that a + b e E. 
Therefore, E is a subring of K. Finally, if a f 0, then Lemma 7.37 says that 
ci q ~ l = 1, and so the inverse of a is a q ~ 2 (which lies in E because E is closed 
under multiplication). ■ 

Proposition 7.39. If k is a finite field having q = p" elements, then every 
a e k is a root of x q — x. 


E. H. Moore was an 
algebraist who later did 
research in geometry and 
foundations of analysis. 


Proof. This follows directly from Lemma 7.37. ■ 

It is remarkable that the next theorem was not proved until the 1890s, 60 
years after Galois discovered finite fields. 

Corollary 7.40 (Moore). Any two finite fields having exactly p n elements are 
isomorphic. 


Proof. By Proposition 7.39, every element of £ is a root of g(x) = x q — x e 
F p [x'], and so £ is a splitting field of g over F^. ■ 

Finite fields are often called Galois fields in honor of their discoverer. In 
light of Corollary 7.40, we may speak of the field with q elements, where 
q = p n is a power of a prime p, and we denote it by 

f 9 . 

The next example displays different finite fields with the same number of 
elements; by Moore’s Theorem, they are isomorphic. 


Example 7.41. (i) In Exercise 4.55 on page 165, we constructed the field F 4 

with four elements: 


^4={[ a ba b +b ]:a,b€F 2 }. 

On the other hand, since / (x) = x 2 + x + 1 € ^i[x] is irreducible, the 
quotient K = F 2 [•*]/(/) is a field. By Proposition 7.20, £ consists of all 
a + bz, where z = x + (/) is a root of / and a,b € F 2 . Hence K also 
is a field with four elements. 

(ii) According to the table in Example 6.57, there are three monic irreducible 
quadratics in F 3 [.r], namely 


p(x) = x 2 + 1 , q(x) = x 2 + x — 1 , and r(x) = x 2 — x — 1 ; 
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each gives rise to a field with 9 = 3 2 elements, namely quotient rings of 
F 3 [x] . Let us look at the first two in more detail. Proposition 7.20 says 
that E = F 3 [x]/ (p) is given by 

E = {a + bz : where z 2 + 1 = 0}. 

Similarly, if F = F 3 [x]/(q), then 

F = {a + bu : where u 2 + u — 1 = 0}. 

Without Moore’s Theorem, it is not instantly obvious that the two fields 
are isomorphic. You can check that the map tp\ E — > F (found by trial 
and error), defined by < p(a + bz) = a + b( 1 — u ), is an isomorphism. 

Now F 3 \x\/ (x 2 — x — 1) is another field with nine elements; Exer- 
cise 7.46 asks for an explicit isomorphism with E. 

(iii) In Example 6.57, we exhibited eight monic irreducible cubics p(x) € 
F 3 [x]; each gives rise to a field F 3 [x\/{p) having 27 = 3 3 elements, and 
Moore’s Theorem says that they are all isomorphic to one another. ▲ 

The following result is known. 

Theorem 7.42 (Primitive Element). Let K / k be a finite field extension ; that 
is, [K : k] < 00 . If either k has characteristic 0 or K is a finite field, then there 
exists a e K such that K = k(a). 

Proof. [26], p. 301. ■ 

Actually, more is known when K is finite: it can be shown that every nonzero 
element of K is a power of a (not merely a linear combination of powers of a). 

Corollary 7.43. For every integer n > 1, there exists an irreducible polyno- 
mial in F^lx] of degree n. 

Proof. Let h(x) = irr(ff. IF /; ) be the minimal polynomial of a. Since h is ir- 
reducible, Corollary 7.21 gives dim^(^f) = deg (h). But if | K\ = p n , then 
dim/ c ( K ) = n. Therefore, since there exists a finite field with exactly p n ele- 
ments, there exists an irreducible polynomial of degree n . ■ 


Exercises 

7.39 Let f(x),g(x) e k[x\ be monic polynomials, where k is afield. If g is irreducible 
and every root of / (in an appropriate splitting field) is also a root of g, prove that 
f = g m for some integer m > 1 . 

Hint. Use induction on deg (h). 

7.40 Determine whether any of the following pairs of fields are isomorphic. 

(i) Q(i) and Q(^(l + i)) 

(ii) Q(i) and Q(i/3) 

(iii) Q( \/2) and Q(V3) 

(iv) Q( V2) and Q(\/6) 
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7.41 Let f(x) = so + si* + ••• + s n - ix” — 1 + x n € k[x \ , where k is a field, and 
suppose that / (x) = (x—zi)(x—Z 2 ) ■ ■ ■ (x—z n ), where the z, lie in some splitting 
field. Prove that s n -i = — (zj + z 2 + ••• + z„) and.ro = (— l)”ziZ2 • • • z„. 
Conclude that the sum and product of all the roots of / lie in k. 

7.42 (i) Show that [Q(cos(2?r/7)) : Q] = 3. 

(ii) Find the minimal polynomial for cos over Q- 

(iii) Find all the roots of this polynomial. 

Hint. See Exercise 3.56 on page 116. 

7.43 Suppose that K/k is an extension field and if A C K. Show that k(A ) is the 
smallest subfield of K containing k and A; that is, if E is any subfield of K 
containing k and A, then k(A) C E. 

7.44 Prove Lemma 7.34. 

7.45 Using the setup from Example 7.41(i), show that the map ip : F4 — > K, defined 
by V ([UL]) = a 4- bz, is an isomorphism. 

7.46 Using the setup from Example 7.41(h), show that 

F3 /(x 2 + 1) £5 F T,[x\/(x 2 - x - 1) 
without using Corollary 7.40. 

7.47 Prove that F3 [x]/(x 3 — x 2 + 1) ^ F3[.v]/(.v 3 — x 2 + x + 1) without using 
Corollary 7.40. 

7.48 Write addition and multiplication tables for the field Fs with eight elements using 
an irreducible cubic over F2. 

7.49 (i) Is F4 isomorphic to a subfield of Fs? 

(ii) For a prime p, prove that if F p n is isomorphic to a subfield of F p m , then n \ m 
(the converse is also true). 

Hint. View F^m as a vector space over F p n . 

7.3 Connections: 

Ruler-Compass Constructions 

There are myths in several ancient civilizations in which the gods demand 
precise solutions of mathematical problems in return for granting relief from 
catastrophes. We quote from van der Waerden [35], 

In the dialogue Platonikos of Eratosthenes, a story was told about the 
problem of doubling the cube. According to this story, as Theon of 
Smyrna recounts it in his book Exposition of mathematical things use- 
ful for the reading of Plato, the Delians asked for an oracle in order to 
be liberated from a plague. The god (Apollo) answered through the ora- 
cle that they had to construct an altar twice as large as the existing one 
without changing its shape. The Delians sent a delegation to Plato, who 
referred them to the mathematicians Eudoxus and Helikon of Kyzikos. 

The altar was cubical in shape, and so the problem involves constructing x/2 
(the volume of a cube with edges of length l is £ 3 ). The gods were cruel, 
for although there is a geometric construction of >/2 (it is the length of the 
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diagonal of a square with sides of length 1), we are going to prove that it is 
impossible to construct X/2 by the methods of Euclidean geometry — that is, 
by using only ruler and compass. (Actually, the gods were not so cruel, for 
the Greeks did use other methods. Thus, Menaechmus constructed 1/2 as the 
intersection of the parabolas y 2 = 2x and x 2 = y; this is elementary for 
us, but it was an ingenious feat when there was no analytic geometry and no 
algebra. There was also a solution found by Nicomedes.) 

There are several other geometric problems handed down from the Greeks. 
Can one trisect every angle? Can one construct a regular n-gon? More pre- 
cisely, can one inscribe a regular /t-gon in the unit circle? Can one “square the 
circle;” that is, can one construct a square whose area is equal to the area of 
a given circle? Since the disk with radius 1 has area 7t, can one construct a 
square with sides of length r ? 

If we are not careful, some of these problems appear ridiculously easy. For 
example, a 60° angle can be trisected using a protractor: just find 20° and draw 
the angle. Thus, it is essential to state the problems carefully and to agree on 
certain ground rules. The Greek problems specify that only two tools, ruler 
and compass, are allowed, and each must be used in only one way. The goal 
of this section is to determine exactly what can be constructed using the two 
“Euclidean tools.” The answer will involve some surprising applications of 
ideas from this chapter. 


How to Think About It. In many geometry classes, constructions are now 
taught using dynamic geometry software. These environments can be used in 
the same way that one uses physical rulers and compasses; the principles are 
the same, and what’s possible in them is what’s possible with pencil and paper. 
This brings up an important point. Constructions made in dynamic geometry 
environments are likely to be more accurate than those carried out with pencil 
and paper, but the goal here is not approximation — we are not content with 
constructing 1/2 correct to 100 decimal places; the goal is to find 1/2 exactly, 
just as we can find / 2 exactly as the length of the diagonal of the unit square. 
We now seek to determine just what constructions are possible, and so we must 
use precise definitions. 


Notation. Let P and Q be points in the plane; we denote the line segment with 
endpoints P and Q by PQ, and we denote its length by PQ. If P and 0 are 
points, we’ll let L(P, Q) denote the line through P and Q , and C(P , Q) the 
circle with center P and radius PQ. We’ll also denote the circle with center P 
and radius r (for a positive number r) by C( P, r). 

The formal discussion begins with defining the tools by saying exactly what 
each is allowed to do. 


In many high school texts, 
L(P,Q) is written as 

PQ. Of course, we can’t 
physically draw the infinite 
line L(P , Q), but PQ has 
endpoints and L(P, Q) 
does not. 


Definition. A ruler is a tool that can be used to draw the line L(P, Q ) deter- 
mined by points P and Q . 

A compass is a tool that can be used to draw circles; given two points P 
and Q, it can draw C( P. Q) and C{ Q . P). 

What we are calling a ruler, others call a straightedge . For them, a ruler 
can be used not only to draw lines but to measure distances as well. 




310 Chapter 7 Quotients, Fields, and Classical Problems 


How to Think About It. Just to show you how fussy we are, let us point 
out a subtlety about what a compass cannot do. Suppose we are given three 
points: P , Q, and R. We are allowed to draw the circle C( P. Q) with center P 
and radius r = PQ. But we are not allowed to draw the circle C( R, r ) with 
center R and radius r. Reason: a compass is allowed to draw a circle only 
if two points are given at the outset; but the circle C(R,r) cannot be drawn 
(using the compass as in the definition) because only one point, namely R, is 
given at the outset. Our compass is called a collapsible compass as compared 
to the more versatile compass that’s allowed to draw C(R , r). We mention this 
now only because the proof of Theorem 7.48(ii) may appear more complicated 
than necessary (we’ll say something more there). 


About 425 bce, Hippias 
of Elis was able to square 
the circle by drawing a 
certain curve as well as 
lines and circles. We shall 
see that this construction is 
impossible using only ruler 
and compass. 


Constructions with ruler and compass are carried out in the plane. Since ev- 
ery construction has only a finite number of steps, we shall be able to define 
constructible points inductively. Once this precise definition is given, we will 
be able to show that it is impossible to double the cube or to trisect arbitrary 
angles using only a ruler and compass. Angles such as 90° and 45° can be 
trisected using a ruler and compass (for we can construct a 30° angle, which 
can then be bisected), but we shall see that a 60° angle is impossible to tri- 
sect. When we say impossible , we mean what we say; we do not mean that it is 
merely very difficult. You should ponder how anything can be proved to be im- 
possible. This is an important idea, and we recommend letting students spend 
an evening trying to trisect a 60° angle by themselves as one step in teaching 
them the difference between hard and impossible. 

Given the plane, we establish a coordinate system by first choosing two dis- 
tinct points, A and A'\ call the line they determine the x-axis. Use a compass 
to draw the two circles C(A. A') and C{A’. A ) of radius AA' with centers A 
and A', respectively (see Figure 7.1). These two circles intersect in two points 
P i and P2; the line L(P\, P 2 ) they determine is called the y -axis’, it is the 
perpendicular-bisector of AA', and it intersects the x-axis in a point O , called 
the origin. We define the distance OA to be 1. We have introduced coordi- 
nates into the plane; of course, O = (0,0), A = (1,0), and A! = (—1,0). 
Consider the point P\ in Figure 7.1. Now ACh4.Pi is a right triangle with legs 
OA and OP\ . The hypotenuse AP\ has length 2 = AA' (for this is the radius 
of C(A, A')). Since OA = 1 , the Pythagorean Theorem gives Pi = (0, V3). 
Similarly, P 2 = ( 0 , — V3). 

Informally, we construct a new point Q from old points E, F, G, and PI by 
using the first pair E 7 ^ F to draw a line or circle, the second pair G ^ // to 
draw a line or circle, and then obtaining Q as one of the points of intersection 



Figure 7.1 . The first constructible points. 
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of the two lines, of the line and the circle, or of the two circles. More generally, 
a point is called constructible if it is obtained from A and A ' by a finite number 
of such steps. 

Given a pair of constructible points, we do not assert that every point on the 
line or the circle they determine is constructible. For example, we can draw the 
x-axis L(A\ A), but, as we’ll see, not every point on it is constructible. 

We now begin the formal discussion. Our goal is Theorem 7.52 which gives 
an algebraic characterization of constructibility. Recall, given distinct points 
P and Q in the plane, that L(P, Q ) is the line they determine and C( P, Q) is 
the circle with center P and radius PQ. 

Definition. Let E ^ F and G H be points in the plane. A point Q is 
constructible from E,F,G, and El if either 

(i) Q e L(E, F ) IT L(G. H), where L(E, F ) ^ L(G, H); 

(ii) Q € L(E, F ) n C(G, H); 

(iii) Q € C{E, F) IT C(G, H ), where C(E, F) ^ C(G, H). 

A point Q is constructible if Q = A, Q = A', or there are points P\ , . . . , P n 
with Q = P n such that every point Pj+i (for 1 < j) is constructible from 
points E , F, G, H in {A, A', P i Pj}. 

If L(E, F) ^ L(G , H ) and L(E, F) is not parallel to L(G, H), then 
L(E, F) fl L(G, H ) is a single point (there are at most two points comprising 
any of these intersections). 

We illustrate the formal definition of constructibility by showing that every 
angle can be bisected with ruler and compass. 

Lemma 7.44. (i) The perpendicular-bisector of a given line segment AB 

can be drawn. 

(ii) If A and B are constructible points, then the midpoint of AB is con- 
structible. 

(iii) If a point P = (cos 9, sin 9) is constructible, then Q = (cos((9/2), sin(0/2)) 
is constructible. 

Proof, (i) The construction is the same as in Figure 7.1. Here, there are two 
points of intersection of the circles C(A , B) and C( B . A), say P\ and lA, 
and L(P \ , Pf) is the perpendicular-bisector of AB. 

(ii) The midpoint is the intersection of AB and its perpendicular-bisector. 



Figure 7.2. Bisecting an angle. 
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We know that the Parallel 
Postulate is not true in 
non-Euclidean geometry. 
What hidden hypotheses 
of Euclidean geometry 
are we using to make this 
construction? 


(iii) The point A = (1, 0) is constructible. By (i), the perpendicular-bisector 
L(0, X) of the chord PA can be drawn. The point Q lies in the inter- 
section of L ( O . X ) and the unit circle, and so Q is constructible. (See 
Figure 7.2.) ■ 

Here is the (tricky) constructible version of the Parallel Postulate. 

Lemma 7.45. If U. V. P are distinct constructible points with P f L ( U. V ), 
then there is a constructible point Q with L{P, Q) parallel to L{U, V). 



Proof. The proof refers to Figure 7.3. Choose U so that L( P. U ) is not per- 
pendicular to L = L(U, V). Thus, L is not tangent to the circle C(P, U), and 
so C(P , U) meets L in another point, say B (of course, B is constructible). 
Let Q e C(P,U) fl C(U, P). Clearly, Q is constructible, and we claim that 
HP, Q) is parallel to L. Indeed, we claim that the quadrilateral PBUQ is 
a rhombus and hence it is a parallelogram. Now PQ is a radius of C( P, U ), 
PU is a radius of both C(P, U) and C(B, U), and BU is a radius of both 
C(B, U ) and C(U , P). Hence, PQ = PU = PB, as we want. ■ 

In high school geometry, the goal is to construct certain figures with ruler 
and compass. We are about to shift the focus, considering instead the notion 
of constructible numbers. “Numbers?” Well, analytic geometry equips points 
with coordinates, and we have seen how to regard points as complex numbers. 

Definition. A complex number z = x + iy is constructible if the point (x, y ) 
is a constructible point. 

Exercise 7.51 on page 326 shows that every element of Z[i] is a con- 
structible number. So is our old friend w = — ^ + i 


How to Think About It. We asked you earlier to contemplate how we could 
prove that something is impossible. The basic strategy is an elaborate indirect 
proof: assuming a certain point Y is constructible, we will reach a contra- 
diction. The first step is essentially analytic geometry: replace points by their 
coordinates, as we have just done by defining constructible complex numbers. 
The next step involves modern algebra; don’t just consider one constructible 
number; consider the set K of all constructible numbers, for the totality of 
them may have extra structure that we can exploit. In fact, we will see that K 
is a subfield of C. Not only can we translate points to numbers, we can also 
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translate the definition of constructibility into algebra. If a point P n +\ is con- 
structible from (constructible) points Pq, P\ , , P n , we shall see that its com- 
plex brother P n + 1 is algebraic over the sub field F = Q ( Pq . P\ , , P„). 
Given that lines have linear equations and circles have quadratic equations, 
it is not surprising that [F( P n +\ ) : F] < 2. The ultimate criterion that a point 
Y be constructible is essentially that [Q(F ) : Q] is a power of 2. 

It follows, for example, that the classical geometric problem of duplicating 
the cube corresponds to the algebraic problem of checking whether [Q(4/2) : 
Q] is a power of 2. Since this degree is 3, the assumption that we can duplicate 
the cube leads to the contradiction in arithmetic that 3 = 2 k for some integer k. 


We continue our discussion of constructibility. 

Lemma 7.46. A complex number z = a + ib is constructible if and only if its 
real and imaginary parts are constructible. 



P = (a, b ) 


A = (a, 0) 


Figure 7.4. Real and imaginary parts. 


Proof If z = a + ib is constructible, then construct lines through P = (a, b) 
parallel to each axis (Lemma 7.45). The intersection of the vertical line and 
the x-axis is A = ( a , 0), so that A is constructible, and hence a = a + 0 i is a 
constructible real number. Similarly, the point Q = (0, b), the intersection of 
the horizontal line and the y-axis, is constructible. It follows that B = (b. 0) is 
constructible, for it is an intersection point of the x-axis and C(0 , Q). Hence, 
b = b + 0i is a constructible real number. 

Conversely, assume that a and b are constructible real numbers; that is, 
(a, 0) and B = (b, 0) are constructible points. The point Q = (0, (?) is con- 
structible, being the intersection of the y-axis and C( O. B). By Lemma 7.45, 
the vertical line through ( a , 0) and the horizontal line through (0, b) can be 
drawn, and (a. b) is their intersection. Therefore, (a. h) is a constructible point, 
and so z = a + i b is a constructible number. ■ 

Definition. Denote the subset of C of all constructible numbers by K . 

The next lemma allows us to focus on real constructible numbers. 

Lemma 7.47. (i) If K fl E is a subfield of R, then K is a subfield of C. 

(ii) If K fl I is a subfield of R and if ~Ja e K whenever a e K fl R is 
positive, then K is closed under square roots. 

Proof, (i) If z = a + ib and w = c + id are constructible, then a, b,c,d e 
LnR, by Lemma 7.46. Hence, a + c , b + d e FflR, because FflR is a 
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subfield, and so (a + c) + i(b + d) e K, by Lemma 7.46. Similarly, zw = 
(ac — bd) + i(ad + be) e K. If z ^ 0, then z -1 = (a / zz) — i(b/zz). 
Now a. b e K fl JR, by Lemma 7.46, so that zT = a 2 + b 2 € K fit, 
because K fl R is a subfield of C. Therefore, z _1 e K. 

(ii) If z = a + ib e K, then a,b e K fl R, by Lemma 7.46, and so r 2 = 
a 2 + b 2 e K fl R, as in part (i). Since r 2 is nonnegative, the hypothesis 
gives r e K fl R and v / r e^fll. 

Now z = r (cos # + i sin (9), so that cos# + i sin# = r~ 1 z e K, 
because K is a subfield of C by part (i). By Lemma 7.44, cos j + i sin 

can be constructed, and hence is in K. But ^/z = ~Jr ^cos j + i sin ^ e 
K, as desired. ■ 


Theorem 7.48. The set of all constructible numbers K is a subfield o/C that 
is closed under square roots and complex conjugation. 


Proof. It suffices to prove that the properties of K fl R in Lemma 7.47 hold. 
Let a and b be constructible real numbers. 


(i) —a is constructible. 

If P = (a, 0) is a constructible point, then (—a. 0) is the other inter- 
section of the x-axis and C(0, P). 

(ii) a + b is constructible. 


I 

b 

Q 


\ 
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1 

\ 

\ 
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\ 
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\ 

b \ 

o 

p 
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Figure 7.5. a + b. 


You are tempted to use 
a compass with center I 
and radius b to draw 
C(P,b). But this is illegal. 
Remember: we’re using a 
collapsible compass that 
requires two points given 
at the outset; here, only 
one is available, namely P 


Assume that a and h are positive. Let / = (0, 1), P = (a, 0), and 
Q = (b. 1). Now Q is constructible: it is the intersection of the horizontal 
line through I and the vertical line through (b. 0), both of which can be 
drawn by Lemma 7.45 (the latter point is constructible, by hypothesis). 
The line through Q parallel to IP intersects the v-axis in S = (ct + b. 0), 
as desired. 

To construct b — a, let P = (—a, 0) in Figure 7.5. Thus, a + b and 
—a+b are constructible; by part (i), —a — b and a — b are also con- 
structible. Thus, a + b is constructible, no matter whether a and b are 
both positive, both negative, or have opposite sign. 

(iii) ab is constructible. 

By part (i), we may assume that both a and b are positive. In Fig- 
ure 7.6, A = (1, 0), B = (1 + a. 0), and C = (0, b). Define D to be the 
intersection of the y-axis and the line through B parallel to AC. Since the 
triangles AOAC and A OBD are similar. 


OB/OA = OD/OC\ 
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hence ( a + 1)/1 = (b + CD)/b, and CD = ab. Therefore, b + ab 
is constructible. Since —b is constructible, by part (i), we have ab = 
(b + ab) — b constructible, by part (ii). 

(iv) If a ^ 0, then o' -1 is constructible. 



Let A = (1,0), S = (0 ,a), and T = (0, 1 + a). Define B as the 
intersection of the x-axis and the line through T parallel to A S ; thus, 
B = (1 + m, 0) for some u. Similarity of the triangles AOStI and AOTB 
gives 


OT/OS = OB/OA. 

Hence, (1 + a)/a = (1 + w)/l, and so u = a -1 . Therefore, 1 + a~ l is 
constructible, and so (1 + c/^ 1 ) — 1 = a~ l is constructible. 

(v) If a > 0, then ^fa is constructible. 



Let A = (L0) and P = (1 + a,0); construct Q, the midpoint of 
OP (if U , V are constructible points, then the midpoint of the segment 
U V is its intersection with the perpendicular-bisector, constructed as in 
Figure 7.1). Define R as the intersection of the circle C(Q . O) with the 
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vertical line through A. The (right) triangles A AOR and A ARP are sim- 
ilar, so that 


OA/AR = AR/AP , 


and hence AR = «Ja. 

(vi) If z = a + ib G K , then 1 = a — ib is constructible. 

By Lemma 7.47, K is a subfield of C. Now a,b G K, by Lemma 7.46, 
and i G K, as we saw on page 310. It follows that —bi G K, and so 
a — ib e K. ■ 

Corollary 7.49. If a, b, c are constructible, then the roots of the quadratic 
ax 2 + bx + c are constructible. 

Proof. This follows from Theorem 7.48 and the quadratic formula. ■ 

We now consider subfields of C to enable us to prove an inductive step in 
the upcoming theorem. 

Lemma 7.50. Let F be a subfield of C containing i that is closed under 
complex conjugation. Let z = a + ib,w = c + id e F, and let P = (a, b) 
and Q = (c, d). 

(i) If a + ib G F, then a G F and b G F. 

(ii) If the equation of L(P, Q ) is y = mx + q, where m,q G R, then 
m,q G F. 

(iii) If the equation ofC(P, Q) is (x—a) 2 + (y—b) 2 = r 2 , where a, b , r G R, 
then r 2 G F. 

Proof, (i) If z = a + ib G F, then a = \(z + z) G F and ib = j(z — z) G 
F; since we are assuming that i G F, we have h = —i(ib) G F . 

(ii) By (i), the numbers a , b , c, d lie in F . Hence, m = (d —b)/(c — a) G F 
and q = b — ma G F . 

(iii) The circle C(P , Q ) has equation (x — a) 2 + (y — b) 2 = r 2 , and r 2 G F 

because r 2 = (c — a) 2 + (d — b) 2 . ■ 

As we said earlier, the next result is intuitively obvious, for the equation of 
a line is linear and the equation of a circle is quadratic. However, the coming 
proof involves some calculations. 

Lemma 7.51. Let F be a subfield ofC containing i and which is closed under 
complex conjugation. Let P, Q, R. S be points whose coordinates lie in F, 
and let a = u + iv G C. If either of the following is true, 

a G L{P, Q) n L(R, S), where L(P, Q) ± L(R, S), 
a G L(P, Q) D C(R, S), 

a g C(P, Q ) n C{R, S ), where C(P, Q) ± C(R, S ), 


then [F(o?) : F] < 2. 
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Proof. If L(P , Q ) is not vertical, then Lemma 7.50(ii) says that L(P , 0) has 
equation y = mx+b, where m, b G F.lfL(P, Q) is vertical, then its equation 
is x = b because P = (a, b ) G L(P, Q), and so b e F, by Lemma 7.50(i). 
Similarly, HR, S ) has equation y = nx + c or x = c, where m, b,n,c G F . 
Since these lines are not parallel, one can solve the pair of linear equations for 
(u, i>), the coordinates of a G L(P, Q ) fl L(R. S ), and they also lie in F . In 
this case, therefore, [F(a) : F] = 1. 

Let L(P, Q) have equation y = mx + b or x = b, and let C(R , S ) have 
equation (x — c ) 2 + (y — d) 2 = r 2 ; by Lemma 7.50, we have m, q, r 2 G F . 
Since a = u + iv G L(P, Q) fl C(R, S ), 

r 2 = (u — c ) 2 + (v — d) 2 

= (u — c ) 2 + (mu + q — d) 2 , 

so that u is a root of a quadratic polynomial with coefficients in F fl R. Hence, 
[F(u) : F] < 2. Since v = mu + q, we have v G F(u), and, since i G F , we 
have a G F(u). Therefore, a = u + iv G F(u), and so [F(a) : F] <2. 

Let C (P, Q) have equation (x — a) 2 + (y — h) 2 = r 2 and let C(R, S) have 
equation (x — c) 2 + (y — d) 2 = s 2 . By Lemma 7.50, we have r 2 , s 2 G F fl R. 
Since a G C(P, Q) fl C(R, S), there are equations 

(u — a) 2 + (v — b) 2 = r 2 and (u — c ) 2 + (i> — d) 2 = s 2 . 

After expanding, both equations have the form u 2 + i > 2 + something = 0. 
Setting the something's equal gives an equation of the form tu + t'v + t" = 0, 
where t,t’,t" G F . Coupling this with the equation of one of the circles returns 
us to the situation of the second paragraph. ■ 

Here is the criterion we have been seeking: an algebraic characterization of 
a geometric idea; it is an exact translation from geometry into algebra. 

Theorem 7.52. A complex number z is constructible if and only if there is a 
tower of fields 


Q = ^ 0 cT 1 c...cL„cC, 


where z G K n and [A 7+1 : Kj] < 2 for all j . 

Proof. Let z = a + ib, and let P = (a, b) be the corresponding point in the 
plane. If z is a constructible number, then P is a constructible point, and so 
there is a sequence of points A, A! , P\, . . . , P n = P with each Pj+i obtain- 
able from {A, A’ , Pi,..., Pj}\ since i is constructible, we may assume that 
Pi = (0, 1). Define 


K 1 = Q(zi) and K j+l = Kj(z j+ 1 ), 

where zj corresponds to the point Pj and there are points E,F,G,H lying in 
{A, A', Pi, . . . , Pj) with one of the following: 

P j+ 1 G L(E, F ) n L(G, H), 

Pj+i G L(E, F ) n C(G, H), 

Pj+i g C(E, F) n C(G, H). 


See Exercises 7.52 
and 7.53 on page 326. 
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Nicomedes solved the 
Delian problem of doubling 
the cube using a marked 
ruler and compass. 


We may assume, by induction on / > 1 , that Kj is closed under complex 
conjugation, so that Lemma 7.51 applies to show that [Kj+\ : Kj] < 2. Fi- 
nally, K/+ 1 is also closed under complex conjugation, for if Zj+ \ is a root of 
a quadratic /(x ) € Kj [x], then 1j+\ is the other root of / . 

Conversely, given a tower of fields as in the statement, then Theorem 7.48 
and Lemma 7.50 show that z is constructible. ■ 

Corollary 7.53. If a complex number z is constructible, then [Q(z) : Q] is a 
power of 2. 

Proof This follows from Theorems 7.52 and 7.27: If k C E C K are fields 
with E / k and K/ E finite extension fields, then [K : k] = [K : E] [E : k] . ■ 

The converse of Corollary 7.53 is false; it can be shown that there are non- 
constructible numbers z with [Q(z) : Q] = 4 (see [27], p. 136). 

Corollary 7.54. (i) The real number cos (In fl) is not constructible. 

(ii) The complex 1th root of unity £7 is not constructible. 

Proof, (i) We saw in Example 7.28 that [Q (cos(2;r/7)) : Q] = 3. 

(ii) £7 = cos( 27 t/ 7) + i sin(2?r/7). ■ 

We’ll soon have more to say about constructibility of roots of unity. 

We can now deal with the Greek problems, two of which were solved by 
Wantzel in 1837. The notion of dimension of a vector space was not known 
at that time; in place of Theorem 7.52, Wantzel proved that if a number is 
constructible, then it is a root of an irreducible polynomial in Q[x] of degree 
2" for some n . 

Theorem 7.55 (Wantzel). It is impossible to duplicate the cube using only 
ruler and compass. 

Proof. The question is whether z = s/2 is constructible. Since x 3 — 2 is 
irreducible, [Q(z) : Q] = 3, by Theorem 7.20; but 3 is not a power of 2. ■ 

Consider how ingenious this proof is. At the beginning of this section, you 
were asked to ponder how we can prove impossibility. As we said when we 
outlined this argument, the constructibility of a point was translated into al- 
gebra, and the existence of a geometric construction produces an arithmetic 
contradiction. This is a spectacular use of the idea of modeling! 

A student in one of our classes, imbued with the idea of continual progress 
through technology, asked, “Will it ever be possible to duplicate the cube with 
ruler and compass?” Impossible here is used in its literal sense. 

Theorem 7.56 (Wantzel). It is impossible to trisect a 60° angle using only 
ruler and compass. 

Proof. We may assume that one side of the angle is on the x-axis, and so 
the question is whether z = cos(20°) + i sin(20°) is constructible. If z were 
constructible, then Lemma 7.46 would show that cos(20°) is constructible. 
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The triple angle formula on page 110 gives 

cos(3a) = 4 cos 3 a — 3 cos a. 

Setting a = 20°, we have cos 3a = j, so that z = cos (20°) is a root of 
4x 3 — 3x — i; equivalently, cos(20°) is a root of /(x) = 8x 3 — 6x — 1 e 
Z[x], A cubic is irreducible in Q[x] if and only if it has no rational roots. By 
Theorem 6.52, the only candidates for rational roots are ±1, ±i, ±^, and 
±i; since none of these is a root, as one easily checks, it follows that / is 
irreducible. (Alternatively, we can prove irreducibility using Theorem 6.55, 
for / (x) = x 3 + x — 1 is irreducible in Zj[x\.) Therefore, 3 = [Q(z) : Q], 
by Theorem 7.20(ii), and so z = cos(20°) is not constructible because 3 is not 
a power of 2. ■ 

Theorem 7.57 (Lindemann). It is impossible to square the circle with ruler 
and compass. 

Proof. The problem is whether we can construct a square whose area is it, 
the area of the unit circle; If the side of the square has length z, we are asking 
whether z = *fit is constructible. Now Q(ir) is a subfield of Q(^fjt). We have 
already mentioned that Lindemann proved that it is transcendental (over Q), 
so that [Q(jr) : <Q>] is infinite. It follows from Corollary A. 41 in Appendix A. 3 
that [Q( y/it) : Q] is also infinite. Thus, [Q(v^L) : Q] is surely not a power of 
2, and so */% is not constructible. ■ 

Other construction tools 

If a ruler is allowed not only to draw a line but to measure distance using 
marks on it (as most of our rulers are used nowadays), then the added function 
makes it a more powerful instrument. Both Nicomedes and Archimedes were 
able to trisect arbitrary angles using a marked ruler and a compass; we present 
Archimedes’ proof here. 

Theorem 7.58 (Archimedes). Every angle can be trisected using a marked 
ruler and compass. 

Proof. It is easy to construct y = 30°, y = 60°, and y = 90°. The trigonomet- 
ric Addition Formula shows that if z = cos f + i sin ft and z' = cos y + i sin y 
can be found, so can zz' = cos (f + y) + i sin(/3 + y). Now if 3/1 = a, then 
3(/8 + 30°) = a + 90°, 3(jS + 60°) = a + 180°, and 3(/8 + 90°) = a + 270°. 



Figure 7.9. Sliding ruler. 
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Thus, it suffices to trisect an acute angle a, for if a = 3/1 and P can be found, 
then so can /l + 30°, /l + 60°, and /l + 90° be found. 

Draw the given angle a = ZAOE, where the origin O is the center of the 
unit circle. Take a ruler on which the distance 1 has been marked; that is, there 
are points U and V on the ruler with U V = 1 . There is a chord through A 
parallel to L(E, F)\ place the ruler so that the chord i s A U . Since a is acute, 
U lies in the first quadrant. Keeping A on the sliding ruler, move the point U 
down the circle; the ruler intersects the extended diameter L(E , F) in some 
point X with UX > 1 . Continue moving U down the circle, keeping A on the 
sliding ruler, until the ruler intersects L{E, F) in the point V . 



Figure 7.10. Trisecting a. 

We claim that /J = ZUVO = ^a. Now 

a = 8 + P, 

because a is an exterior angle of AAOV, and hence it is the sum of the two 
opposite internal angles. Since A OAU is isosceles ( <9,4 and OU are radii), 
8 = s, and so 


a = e + P- 

But s = y + P = 2 P, for it is an exterior angle of the isosceles triangle 
A UVO; therefore, 


a = 2p + P = 3p. m 

In addition to investigating more powerful tools, one can look at what can 
be accomplished with fewer tools. It was proved by Mohr in 1672 and, inde- 
pendently, by Mascheroni in 1797, that every geometric construction carried 
out by ruler and compass can be done without the ruler. There is a short proof 
of the theorem given by Hungerbuhler in American Mathematical Monthly, 
101 (1994), pp. 784-787. 

Constructing Regular /7-gons 

High school geometry students are often asked to construct various regular 
polygons. In light of our present discussion, we can phrase such problems more 
carefully. 

Which regular polygons can be inscribed in the unit circle using only 
ruler and compass ' ? 
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Because they can construct 90° and 60° angles, high school students can con- 
struct squares and hexagons (just make right and 60° central angles), and they 
can connect every other vertex of their hexagon to inscribe an equilateral trian- 
gle. Also, by using the perpendicular-bisector construction, they can inscribe 
a regular polygon with twice as many sides as an already constructed one, so 
they can inscribe regular polygons with 3 • 2" and 4 • 2" sides for any positive 
integer n . Archimedes knew that it is the area of the unit circle, and he approx- 
imated it by inscribing and circumscribing a regular 96-gon (he began with a 
regular hexagon and then doubled the number of sides four times). 

This is about as far as most high school programs get, although some treat 
polygons with 5 • 2" sides (using Exercise 3.48 to construct the decagon and 
then connecting every other vertex). This is also about as far as Greek geome- 
ters got, although they also were able to show (see Exercise 7.67) that if a 
regular m-gon and and a regular n-gon are inscribable in a circle (again, with 
only ruler and compass), then so is a regular nm- gon; for example, a regular 
15-gon can be inscribed. However, it was unknown whether all regular poly- 
gons could be so inscribed. 

About 2000 years later, around 1796, Gauss — still in his teens — essentially 
invented the main results in this section, and he applied them to the problem of 
determining whether a regular polygon could be inscribed in a circle with ruler 
and compass (he wrote that his main result on this problem led to his decision 
to become a mathematician). We’ll develop his methods here. 

Theorem 3.28 tells us that the vertices of a regular n-gon inscribed in the 
unit circle can be realized in the complex plane as the set of roots to x n — 1, 
and that these roots are all powers of 

= cos(27r//t) + i sin(27r//t). 

So, we can recast our question about inscribability and ask: 

For which values ofn is t, n a constructible number ? 

Well, we can hit this question with Theorem 7.52: 


We’ll revisit the construc- 
tion of the pentagon in just 
a minute, putting it a more 
general setting. 


Given the development 
so far, you may already 
see that the problem 
can be translated to the 
algebra of constructible 
complex numbers, but 
this was a huge leap 
for mathematicians of 
Gauss’s time and certainly 
out of reach for Greek 
geometers. 


Corollary 7.59. A regular n-gon can be inscribed in the unit circle with ruler 
and compass if and only if there is a tower of fields 


Q = K 0 c Ki c • • • c K n c C, 


where = e 27Z '^ n lies in K n and [Kj+ \ : Kj] < 2 for all j . 

Proof. Indeed, a regular 77 -gon can be so inscribed if and only if and, hence, 
all its powers, are constructible numbers. ■ 

Gauss showed how to construct such a tower when n = 17, and his method 
was general in principle, leading to a complete classification of inscribable 
regular polygons. Before we state the main result, let’s work through two ex- 
amples as Gauss did (all laid out in detail by him in [ 14], Section VII). 

Example 7.60. In Example 3.34 on page 1 13, we showed how to find explicit 
formulas for the vertices of a regular pentagon inscribed in the unit circle. 
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Let’s look at this from the perspective of this chapter. Write £ instead of £ 5 . 
The nonreal roots of x 5 — 1, namely £, £ 2 . £ 3 , £ 4 , are the roots of the irre- 
ducible polynomial 

<I> 5 (x) = x 4 + x 3 + x 2 + x + 1. 

It follows that 

[Q(£) : Q] = 4, 

so Corollary 7.53 tells us that there’s a chance that £ is constructible. In Ex- 
ample 3.34, without using this language, we actually constructed the tower of 
quadratic extensions necessary to guarantee that £ is, in fact, constructible. We 
showed that if g = £ + £ 4 = 2cos(2 n/5) and h = £ 2 + £ 3 = cos(4tt/5), 
then g and h are roots of the quadratic equation x 2 + x — 1 , so that 

[Q(C + £ 4 ) : Q] = 2. 

By Theorem 7.27, 

[Q(0 : Q(£ + ^ 4 )][Q(t + £ 4 ) : Q] = [Q(0 : Q] = 4 , 

so that 

[Q(£) : Q(£ + £ 4 )] = 2, 

and we have our tower of quadratic extensions: 

QCQ(^? 4 )CQ© a 

Gauss’s construction of the 17-gon 

Stepping back a bit, we can describe what we did with the pentagon: the non- 
real roots are 

{£,£ 2 ,£ 3 ,£ 4 } 

(we are still writing £ instead of £ 5 ). There are four roots. The first story of our 
tower is Q(£ + £ 4 ), generated by the sums of pairs of the roots: £ + £ 4 and 
£ 2 + £ 3 . The top story, Q(£), is generated by the individual roots themselves. 

This is the basic idea behind Gauss’s insight into the 17-gon, but the situa- 
tion here is more complicated. Change notation again; now let 

£ = £17 = cos( 27 t/ 17) + i sin(2jr/17). 
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Because the minimal polynomial of £ over Q is 

<P(x) = x 16 + x ls 4 hx 2 + x + 1, 

we have [Q(£) : Q] = 16. There are sixteen roots of O17: {£* : 0 < k < 15}. 
Together with 1, these points on the unit circle are the vertices of our regular 
17-gon. 

For each factorization 16 = e/, Gauss divided the roots into e sums of 
/ = 16/e roots each: r\ e% 0, r\ e , 1 , . . . , r\ e ,e- 1, where each r\ e ^ is a sum of / 
roots. For example, he divided the sixteen roots into two sums of eight each, 
which we can call r\ 2,0 and 772,1 , as follows: 

72,0 = £ + £ 9 + £ 13 + £ 15 + £ 16 + £ 8 + £ 4 + £ 2 

72, t = £ 3 + £ 10 + £ 5 + t 11 + £ 14 + C + ^ + £ 6 . 


There are also four sums of four each: 


74.0 = £ + £ 13 + £ 16 + £ 4 

74.1 = £ 3 + £ 5 + £ 14 + £ 12 

74.2 = £ 9 + ^ 15 + £ 8 + ^ 2 

/] 4,3 = ^ 10 + ^ 11 +^ 7 + ^ 6 . 

And there are eight sums of two each: 

78.0 = m 16 

78.1 = £ 3 + £ 14 
78,2 = ^ 

78.3 = r + ? 

78.4 = £ 13 + ? 

78.5 = ^ + ^ 

78.6 = + £ 2 

78,7 = + £ 6 - 

Finally, there are sixteen “sums” of one each, namely 
{716,* = £*|0 <*<is}. 


Gauss called each of the r] e ^ a period of length f = 16/ e. The plan is 
to show that the periods of length eight lie in a quadratic extension K\ of Q, 
the periods of length four lie in a quadratic extension K 2 of K 1, and so on, 
building a tower of quadratic extensions ending with <Q>(£). 

The calculations will sometimes be involved so, once again, pull out your 
pencil or computer. 

• Because <!>(£) = 0, we see that r] 2,0 + 72,1 = — 1. 

• With a little patience and care (or a CAS), you can check that 

72,072,1 = 4(/]2 ,o + 72,1) = - 4 . 


We’ll see what method 
Gauss used to partition the 
roots into these sums in 
just a minute. 


Each period of length > 1 
is a real number: you can 
check that if f A ' occurs in 
so does 

? 17_A _ j.-k _ j-k t > 

so that each period is a 
sum of terms of form z + z, 
and hence a real number. 


If you use a CAS, you can 
perform all of these calcu- 
lations in Q[x]/($(x». 
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You can also check that 
r >2 0 = 7n4 (Exercise 7.60 
on page 327). 


Qttl?) 

2 

Q(*?8,l) 

2 

Q(V4.l) 

2 

Q(*?2,l) 

2 

Q 


Hence rj 2,0 and 772,1 are roots of x 2 + x — 4 , and so 
[Q( 72 ,t) : Q] < 2 . 

The first step in our tower is <Q> C Q(r] 2 ,i)- Note that 772,0 = — 4 / 72 , 1 , so that 
772,1 € Q(? 72 ,o)- Next, we move up to the periods of length 4 . You can check 
(Exercise 7.61 on page 327 ) that 

> 14,1 + 74,3 = 7 2,1 
^74,1 174,3 = -1- 

Hence, 774,1 and 774,3 are roots of v 2 — 77243: — 1 , and 
[Q 0 ? 4 ,i) : QO724)] < 2 . 

Q(?74,i)/Q(?72,i) is the second story in our tower. 

Q C Q(77 2;1 ) c Q(774 ;1 ). 

Note that 774,3 = —1/774,1, so that 774,3 G Q(t74,i). 

Up one more story — the periods of length eight: you can check (Exer- 
cise 7.62 on page 327 ) that 


78,1 + 78,5 = 74,1 
78478,5 = 74,2- 

So, 77 8, 1 and 778,5 are roots of x 2 — 7443c + 74,2- This says that 

[Q(78,i) : Q(74, 1 , 74 , 2 )] < 2. 

But, by Exercise 7.63 on page 327 , 

Q(74,l, 74 , 2 ) = Q(74,l), 

so that 


[Q(78,0 : Q(74,i)] < 2. 

Assemble what we have built: 

Q c Q(72,i) c Q(74,i) c Q(7 8 ,i) c Q(C). 
The degree of each extension is at most 2 ; since 


all the degrees are equal to 2 (Theorem 7 . 27 ). Hence, we have constructed a 
tower of fields, each quadratic over the one below, starting with Q and ending 
with Q(£). We have proved that / is constructible. 

Theorem 7.61. A regular \l-gon can be inscribed in the unit circle with ruler 
and compass. 
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One detail that remains is to see what method Gauss used to assign different 
powers of £ to each period; how did Gauss decide which powers of £ should 
occur in each i ) e £? The answer comes from Galois theory (a subject we only 
briefly touch on in Chapter 9). He employed an ingenious method using the 
fact that 3 is a primitive element in F 17 ; that is, every nonzero element in F 17 
is a power of 3. 


k 

0 

1 

2 
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4 
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One of the reasons has 
already been mentioned: 
each period should contain 
a sum of terms, each 
of form + t,~ k . This 
ensures that every story in 
our tower except the last 
is contained in R, so we 
“save the complex step” for 
the end (why is this a good 
thing?). 


Gauss used this special property of 3 to define the periods: if ef = 16, there 
are k periods of length / defined by 


So, for example. 


t]e,k 


/-I 
j = 0 


3 k+j e 


s - 3 4 +°- 8 , s - 3 4 + 1! 

v 8,4 = r + r 

_ £81 _|_ £531441 
= £ 4 + £ 13 . 


Sufficiency of the following theorem, a feat the Greeks would have envied, 
was discovered by Gauss around 1796. He claimed necessity as well, but none 
of his published papers contains a complete proof of it. The first published 
proof of necessity is due to Wantzel, in 1837. 


Theorem 7.62 (Gauss- Wantzel). If p is an odd prime, then a regular p-gon 
is constructible if and only if p is a Fermat prime ; that is, a prime of the form 
p = 2 2 +1 for some t > 0. 

Proof We only prove necessity. The problem is whether z = e l7Il ' p is con- 
structible. Now z is a root of the cyclotomic polynomial <J> /; (.v), which is an 
irreducible polynomial in Q[x] of degree p — 1, by Corollary 6.68. 

Since z is constructible, p — 1 = 2 s for some s, so that 

p = 2 s + 1. 

We claim that s itself is a power of 2. Otherwise, there is an odd number k > 1 
with s = km. Now k odd implies that —1 is a root of x k + 1; in fact, there is 
a factorization in Z[x]: 

x k + 1 = (x + IK**- 1 - x k ~ 2 + x k 1- 1). 

Thus, setting x = 2 m gives a forbidden factorization of p in Z: 

p = 2 s + 1 = (2 m ) k + 1 

= [2 m + \}[(2 m ) k ~ l - ( 2 m f ~ 2 + (2 m ) k - 3 + 1 ]. ■ 

The only known Fermat primes are 3, 5, 17, 257, and 65537. It follows from 
Theorem 7.62, for example, that it is impossible to construct regular 7-gons, 
1 1 -gons, or 13-gons. 


Gauss established suffi- 
ciency by generalizing the 
construction of the 17-gon, 
giving explicit formulas of 
the T]e,f for all pairs (e, /) 
with ef = p— 1 (see [33], 

pp. 200-206). 
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Further results. Fermat conjectured that all numbers of the form F m = 
2 1 "' + 1 are prime, but Euler factored 

F 5 = 2 32 + 1 = 4, 294, 967, 297 = 641 x 6700417. 

We now know that F m is composite for 5 < m < 32, but it is unknown whether 
F 33 is prime. The largest F m that has been shown to be composite (as of 201 1) 
is F 2543548 , and the latest conjecture is that there are only finitely many Fermat 
primes. 


The strongest known result is: 

Theorem 7.63. A regular n -gon is constructible with ruler and compass if and 
only ifn = 2 k p\ • ■ ■ p t , where k > 0 and the pj ’s are distinct Fermat primes. 

Proof. See [15], page 97. ■ 


Exercises 

7.50 Explain how to carry out each of the following constructions with ruler and com- 
pass. Prove that your method works. 

(i) Copy a segment. 

(ii) Copy an angle. 

(iii) Construct a line parallel to a given one through a given point not on the line. 

(iv) Construct a line perpendicular to a given one through a given point either on 
or off the line. 

7.51 * 

(i) Prove that every lattice point ( m,n ) in the plane is constructible. Conclude 
that every Gaussian integer is constructible. 

(ii) Prove that every Eisenstein integer is constructible. 

7.52 Suppose that l and l' are lines with equations ax + by = c and dx + ey = /, 
and suppose that the coefficients of the equations are all in a field k. 

(i) What condition on the coefficients guarantees that l and t' intersect in a 
unique point? 

(ii) If i and l' intersect in a unique point P . show that P is point whose coordi- 
nates are in k. 

7.53 If the quadratic polynomial ax 2 + bx + c has coefficients in a field k, show that 
its roots are a quadratic extension of k. 

7.54 Given a segment of length a, show how to construct a segment of length a/5. 

7.55 Given a segment of length a, show how to construct a segment of length a/ n, 
where n is any positive integer. 

7.56 Show how to construct segments of length 

(i) 75 (ii) 3 + 75 

(iii) 73+75 (iv) 71+21 

< v > ^7 
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7.57 Show how to construct the complex numbers 

(i) 1 + i 

(ii) (1 + i ) 

(iii) (1 + i ) 

(iv) cos 15° + i sin 15° 

(v) cos 22.5° + i sin 22.5° 

(vi) cos 36° + i sin 36°. 

7.58 Show that the side of a regular decagon inscribed in the unit circle is constructible. 

7.59 Show that if the side-length of a regular n-gon inscribed in the unit circle is con- 
structible, so is the side-length of a regular 2/t-gon inscribed in the unit circle. 

7.60 * Using the notation of this section, show that 

72,0*72,1 = - 4 and 72,0 = 7 b, 1- 

7.61 * Using the notation of this section, show that 

74.1 + 74,3 = 72,1 and 74,174,3 = -1- 

7.62 * Using the notation of this section, show that 

78.1 + 78,5 = 74,1 and 78, 1 78,5 = 74,2- 

7.63 * Using the notation of this section, show that 

74, 1 ~ 674,1 + 3 = -2/74,2, 

so that Z74, 2 6 Q(74,l) 

7.64 Let f = £17 and let k be a nonnegative integer. Show that 

i* +C k = 2cos(2/br/17). 

7.65 Show that 3 is a primitive element for Z5, and apply this to Example 7.60. 

7.66 Find the minimal polynomial over Q of z;g ^ for all 0 < k < 7. 

7.67 Show that if (m, n) = Land if and are constructible, so is . 

Hint. Use Theorem 7.63. 





Cyclotomic Integers 


After proving Corollary 1.8, the special case of Fermat’s Last Theorem for 
exponent n = 4, we observed that the full theorem would follow if we could 
prove, for every odd prime p, that there are no positive integers a,b,c with 
a p + b p = c p . It is natural to factor this expression as in Exercise 3.50(ii) on 
page 1 15: for odd p , we have 

(a + b)(a + t,b) ■■■(a + ^ p ~ l b) = c p , (8.1) 

where £ = e 2n b p is a p\h root of unity. We didn’t have the language of rings 
at the time but, later you showed, in Exercise 4.65 on page 168, that the cyclo- 
tomic integers Z[£] is a domain. How could we begin to use this observation? 
Recall Exercise 2.14 on page 59: if ab = c n in Z, where n is a positive integer 
and a, b are relatively prime, then both a and b are nth powers. If Z[£] behaved 
like Z and the factors on the left-hand side of Eq. (8.1) are pairwise relatively 
prime, then all the factors a + b would be p\h powers; that is, 

a + ^ b = d p 

for some dj e Z[£], For example, consider the case p = 3, so that Z[£] = 
Z [co\ is the ring of Eisenstein integers. The factorization is 

(a + b)(a + cob)(a + co 2 b) = c 3 . 

If the factors on the left-hand side are pairwise relatively prime, then each 
of them is a cube of an Eisenstein integer. We can even say something if the 
factors are not relatively prime: assuming that Z[<w] has unique factorization, 
any prime divisor of c must occur three times on the left-hand side ( prime now 
means a prime in Z[<w]; that is, an element whose only divisors are units and 
associates — we will use the term prime here instead of irreducible ). 

But does the ring of cyclotomic integers behave as Z? To solve Exercise 
2.14, we need to use the Fundamental Theorem of Arithmetic: factorization 
into primes exists and is essentially unique. It turns out that some rings of cy- 
clotomic integers do enjoy unique factorization into primes, but some do not. 
Indeed, it is known (see [36], p. 7) that Z[^23] does not have unique factoriza- 
tion. 

It’s clear that we need a more thorough investigation of the arithmetic in 
rings of cyclotomic integers. In particular, we already know the units in Z[i] 
and Z[co\ (Example 6.3); what are the primes? 

In Section 8.1, we retrace the by-now-familiar developments in Chapters 1 
and 6 to establish division algorithms in Z[z] and Z[w\ (using norm to mea- 
sure size). Even though these are the easiest rings of cyclotomic integers, this 
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will give us a clue how to proceed with Z[£] for other roots of unity £. There 
will be a bonus: we’ll be able to prove Fermat’s Two-Square Theorem that 
characterizes all primes in Z which are sums of two squares. 

As is our custom (because it is so useful), we’ll generalize from these and 
the earlier examples of Z and k[x\, where k is a field, to define a Euclidean 
domain — a domain having a generalized division algorithm. We’ll show that 
every Euclidean domain is a PID, so that, by Theorem 6.50, Euclidean domains 
are UFDs and thus have unique factorization. 

In Section 8.2, we’ll see that there are primes in Z[i] and Z [o>\ that are not 
ordinary integers. On the other hand, some primes in Z remain prime in the 
larger rings, while some split into non-unit factors. We’ll then make a complete 
analysis of this phenomenon for Z [i ] and sketch the analogous theory for Z[co\. 

In Section 8.3, we’ll prove Fermat’s theorem for exponent 3. The fact that 
there are no non-trivial integer solutions to x 3 + y 3 = z 3 is attributed to Euler; 
we’ll prove the result making basic use of the arithmetic of Z[w\. 

In Section 8.4, we’ll briefly sketch how the proof for exponent 3 gener- 
alizes to prime exponent p when the ring of cyclotomic integers Z[t, p \ is a 
UFD, where £ = e 2ni ' p . But there are primes p for which Z[£ p \ does not 
have unique factorization. What then? We’ll finish this section with a brief 
discussion about how Kummer’s construct of ideal numbers (which Dedekind 
recognized as equivalent to what we now call ideals , which is why they are 
so-called) could be used to restore a kind of unique factorization to Z [(,,]. 

Finally, in Section 8.5 we develop the machinery to prove a lovely theorem 
of Fermat that determines the number of ways a positive integer can be written 
as the sum of two perfect squares. 


8.1 Arithmetic in Gaussian 
and Eisenstein Integers 

We begin by showing that Z[i ] and Z[w\ have generalized division algorithms. 
Actually, we’ll show that long division is possible in these rings; that is, there 
are quotients and remainders. However, quotients and remainders are not nec- 
essarily unique; stay tuned. 

Given two Gaussian integers z and to, can we find Gaussian integers q and 
r so that w = qz + r, where r is “smaller than” z? The obvious way to 
compare size in C is with absolute values, but it’s easier to calculate norms 
(squares of absolute value); by Exercise 3.62 on page 127, |r| < |z| if and 
only if N(r ) < N(z). Let’s start with an example. 

Example 8.1. Take z = —19 + 48/ and w = —211 + 102/. Can we find q 
and r so that w = qz + r, where N(r) < N{z)l We can certainly divide z into 
w in Q[/]; it’s just that w/z may not be a Gaussian integer. In fact, 

-211+102/ _ (-211 + 102/)(— 19 — 48/) 

(—19 + 48/ ) ( — 1 9 — 48/) 

8905+ 8190/ 

2665 

137 126. 

— — — I — — — i ■ 


-19 + 48/ 
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The idea is to take q to be the Gaussian integer closest to ®/: in the complex 
plane, and then to find an r that makes up the difference. Since 


w 

z 


137 126. 

IT + IT' 


3.34 + 3.07/, 


we’ll take q = 3 + 3/. What about r? There’s no choice; since we want w = 
qz + r, set r = w — qz: 


r = w-qz= (-211 + 102/) - (3 + 3/)(-19 + 48/) = -10+ 15/. 


By construction, w = qz + r. What’s more, that q is the closest Gaussian 
integer to w/z implies, as we’ll see in the proof of the next proposition, that 
N(r) < N(z). Indeed, N(z) is much bigger that N(r) in this example, because 
w I z is so close to q. 

N(r ) = (-10) 2 + (15) 2 = 325 and N(z) = (-19) 2 + (48) 2 = 2665. ▲ 


This method for choosing q and r works in general. 

Proposition 8.2 (Generalized Division Algorithm). Ifz and w are Gaussian 
integers with z / 0, then there exist Gaussian integers q and r such that 

w = qz + r and N(r) < N(z). 

Proof. Suppose that w/z = a + bi , where a and b are rational numbers (but 
not necessarily integers). As in Example 8.1, take q to be a Gaussian integer 
closest to w / z in the complex plane; more precisely, choose integers m and n 
so that 

\a—m\<j and \b — n\<j, 
and let q = m + n i . Now define r to be the difference: 

r = w — qz. 

Clearly, w = qz + r, so the only thing to check is whether N(r) < N(z). To 
this end, we have 

N(r) = N{w — qz) = N ^z( = N(z)N ^ ^ . 

But w/z — q = (a — m) + (b — n)i, so that 

N ~ q) = ( a ~ m ) 2 + Q> — n) 2 < \ + \ < 1. 

It follows that N(r) < N(z). ■ 


How to Think About It. The earlier statements of the division algorithms 
for Z (Theorem 1.15) and for k[x\, where k is a field (Theorem 6.11), differ 
from that in Proposition 8.2; the latter statement does not assert uniqueness of 
quotient and remainder. 

In fact, the way q and r are constructed shows that there may be several 
choices for q and r — locate w/z inside a unit square in the complex plane 
whose vertices are Gaussian integers, and then pick a closest vertex. There 
may be several of these, as the next example shows. Luckily, we won’t need 
uniqueness of quotients and remainders to get unique factorization into primes. 
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Example 8.3. If z = 2 + 4/ and w = —9 + 17/, then 

2.5 + 3.5/. 


w 5 7. 

— — T + 

z 2 2 


In contrast to Example 8 . 1 , u; /z sits smack in the middle of a unit square whose 
vertices are Gaussian integers and, hence, there are four choices for q , namely 

2 + 3/, 3 + 3/, 3 + 4/, 2 + 4/ * 


and there are four corresponding divisions, namely 

-9 + 17/ = (2 + 3/ )(2 + 4/) + (-1 + 3/); 

-9 + 17/ = (3 + 3/ )(2 + 4/) + (-3 - /); 

-9 + 17/ = (3 + 4/ )(2 + 4/) + (1 - 3/); 

-9 + 17/ = (2 + 4/ )(2 + 4/) + (3 + /). 

All of these work. In fact, all the remainders have (the same) norm 10 < 20 = 
N(z). Even more: the remainders are all associates. Is this an accident? See 
Exercises 8.1 through 8.4 on page 336. A 


Alas, there are other 
rings Z[J] of cyclotomic 
integers which do not have 
a generalized division 
algorithm. 

Proposition 8.4 (Generalized Division Algorithm). Ifz and w are Eisenstein 
integers with z/0, then there exist Eisenstein integers q and r such that 

w = qz + r and N(r) < N(z). 

Proof. Suppose that w/z = a + bu>, where a and b are rational numbers (but 
not necessarily integers). Take q to be a Eisenstein integer closest to w/z in 
the complex plane (with respect to the norm); more precisely, choose integers 
m and n so that 


There is an analogous result for the Eisenstein integers Z [a>\, and its proof 
is almost identical to that for the Gaussian integers. Recall that c + da> = 
c + dco 2 = c + d(— 1 — cu), and that 

N(c + dev) = c 2 — cd + d 2 . 


\a—m\<j and \b — n\ < 
and let ty = m + mu. Now define r to be the difference 

r = w — qz. 

Clearly, w = qz + r, so the only thing to check is whether N(r ) < 1 V(z). To 
this end, we have 

N(r ) = N(w — qz) = N (z ^ < 7 ^ = N{z)N ^ q^j . 

But w/z — q = (a — m ) + (b — n)co, so that 

N (w/z — q) = (a — m) 2 — (a — m)(b — n) + (b — n) 2 <| + | + |<1. 
It follows that N{r) < N(z). ■ 
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Why can’t we modify the proof of Proposition 8.4 to prove the result for 
every ring of cyclotomic integers? The short answer is that there are counter- 
examples. But the reason the proof fails to generalize is that we can’t verify 
N(r) < N(z ) in every Z[£]. 


Example 8.5. Let’s divide w = 91 + 84<y by z = 34 + 53w. First calculate 
w/z in Q[cu]: 


91 + 84 to 
34 + 53m 


(91 + 84a>)(34 + 53 m 2 ) _ 2723 - 1967cu 


(34 + 53ft>)(34 + 53co 2 ) 

389 281 

m ss 1.26 — .91 m. 

309 309 


2163 


Now set q = 1 — m: 


r = w — zq = 4 + 12 m. 


You can check that 

N(z) = 34 2 — 34-53-F53 2 = 2163 and N(r) = 4 2 -4-12+12 2 = 112. ▲ 


How to Think About It. If z, w are either in Z [/ ] or in Z [w \ , we could 
iterate the respective generalized division algorithms, as we did in Z or k[x\, 
to obtain Euclidean algorithms giving a greatest common divisor cl of z and 
w; moreover, cl can be expressed as a linear combination of z and w ensuring, 
as in earlier instances of this argument, that any common divisor of z and w is 
a factor of d . We’ll give an example of such a calculation shortly. 

Along the way, there may be choices to be made for quotients and remain- 
ders, possibly resulting in different “greatest” common divisors d . We ran into 
this situation before: the Euclidean Algorithm in k[x\ produces a gcd up to a 
unit factor. The same is true in Z[i] and Z [co], although it may not be obvious 
at this point because of the twists and turns that the Euclidean Algorithm might 
take. This is another example in which a more abstract setting can make things 
clearer (one reason for the added clarity is that abstraction casts away noise, 
allowing you to focus on the heart of a problem). 


Euclidean Domains 

Looking at our main examples — Z, k[x\, Z [/], Z[m \ — we see that one key 
to a division algorithm is a measure of size: absolute value for Z, degree for 
k[x\, norm for Z[i] and Z [co\. Now we generalize. 

Definition. A Euclidean domain is a domain R equipped with a size function 

d:R-{ 0}^N 

such that, for all a, b e R with a f 0, there exist q and r in R with 

b = qa + r, 


As with Z[i], there may 
be several choices for q 
when dividing Eisenstein 
integers (see Exercise 8.5 
on page 337). 


3 is defined on the nonzero 
elements of R and takes 
nonnegative integer val- 
ues. 


where either r = 0 or 3(r) < 3(a). 




334 Chapter 8 Cyclotomic Integers 


Some size functions have extra properties. For example, when R is a do- 
main, then f?[x] is a domain, and degree (which is a size function on f?[x]) 
satisfies deg (fg) = deg (/) + deg(g), while norm N (which is a size function 
on Z[i] and Z [&>]) satisfies N(af) = N(a)N(f). On the other hand, if 3 is a 
size function of a Euclidean domain R , then so is 3', where <)' (a ) = 3(a) + 1 
for all a € R — { 0}. It follows that a size function may have no algebraic 
properties; moreover, there may be no elements in R having size 0. 

Euclidean domains have nice properties. The proof of the next proposition 
is essentially the same as that of Theorem 1.19. 

Proposition 8.6. Every Euclidean domain is a principal ideal domain. 

Proof. Suppose that R is a Euclidean domain with size function 3. We want to 
show that every ideal / in R is principal. If I = {0}, then / is principal, and 
so we can assume that / contains nonzero elements. The set 

S = {3 (z) : z e /} 

is a set of nonnegative integers and, hence, it has a least element; call it m. 
Choose d to be any element of / of size m. We claim that / = (d). 

Clearly, ( d ) C I . To get the reverse inclusion, suppose that z € I is not 0; 
we must show that z e (d). Now there are q and r such that 

z = qd + r, 

where either r = 0 or 3 (r) < 3 (cl). But r e /, because r = z — dq and both 
z and d are in I . But 3 (d) is the smallest size among elements of /; hence, 
r = 0, and z = qd e (cl). ■ 

Corollary 8.7 (Euclid’s Lemma). Let R be a Euclidean domain with a . b e R. 
If p e R is irreducible and p \ ab, then p \ a or p \ b. 

Proof. This is a direct consquence of Theorem 6.47. ■ 

Corollary 8.8. Every Euclidean domain is a unique factorization domain ; that 
is, every nonzero non-unit has a factorization into irreducibles that is essen- 
tially unique. 

Proof. This is a direct consequence of Theorem 6.50: every PID is a unique 
factorization domain. ■ 

Corollary 8.8 probably piques your curiosity about what primes look like in 
Euclidean domains. We’ll consider this question for Z[i] and Z [o>\ in the next 
section. 


How to Think About It. Points about the development so far. 

• Euclidean domains have generalized division algorithms, but they are not 
necessarily algorithms in the technical sense. They are procedures for com- 
puting quotients and remainders, but the division procedures, even for Z[i] 
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and Z[co], are not deterministic: there is a choice about how to calculate 
quotients and remainders. 

• In Section 1.3, we studied a direct path in Euclid's Elements from the Divi- 
sion Algorithm in Z to the Fundamental Theorem of Arithmetic. This path 
can be followed in a much more general setting. We just saw that every Eu- 
clidean domain enjoys Euclid’s Lemma and a fundamental theorem. 

• One way to show that a domain is a PID is to show that it is Euclidean — 
indeed, this is one of the most important uses of this notion. On the other 
hand, it’s hard from first principles to show that a domain is not Euclidean 
(you have to show that no size function exists). Often, the easiest way to 
show that a domain is not Euclidean is to show that it’s not a PID. So, for 
example, Z[x] is not a Euclidean domain. 

• There are PIDs that are not Euclidean, so that the converse of Proposition 8.6 
is false. Motzkin found a property of Euclidean domains that can be de- 
fined without mentioning its size function. He called an element cl in an 
arbitrary domain R a universal side divisor if d is not a unit and, for ev- 
ery r e R , either cl \ r or there is some unit u e R with d \ (r + u). 
He then proved that every Euclidean domain contains a universal side divi- 
sor, namely any non-unit of smallest size. Now it was known that if a = 
i(l + V— 19), then the ring Z[a\ is a PID. Motzkin then showed that Z[a\ 
has no universal side divisors, and he concluded that Z[a] is a PID that is 
not a Euclidean domain (see Wilson, A principal ideal ring that is not a 
Euclidean ring. Math. Magazine 46 (1973), 34-38 and Williams, Note 
on non-Euclidean principal ideal domains. Math. Magazine 48 (1975), 
176-177). 


The fact that a Euclidean domain is a PID allows us a to talk about gcd’s, 
thanks to Theorem 6.46. Using exactly the same logic as in Chapter 1, we 
can iterate division, creating a Euclidean algorithm that finds a gcd for us: 
just move factors on one line southwest on the next line (as in the next 
example). 


Example 8.9. Building on the calculation in Example 8.5, let’s find a gcd for 
91 + 84w and 34 + 53ft) in Z[a>]. We’ll use the algorithm outlined in Proposi- 
tion 8.4 to carry out the divisions (a CAS is very useful here). There are four 
equations, which we present in “southwestern style:” if a row has the form 
f = qh + r, then the next row moves h and r southwest and looks like 
h = q'r + r' . 


91 + 84m = (1 - cu) (34 + 53m) + (4 + 12m) 
34 + 53m = (3 - cu) (4 + 12m) + (10 + 9m) 

4 + 12m = (1 4- ft) ) ( 10 + 9 m) -(- (3 + 2m) 
10 + 9m = (4 + cu)( 3 + 2m). 


Here is a second format, arranging the calculations as we did in Z in Chapter 1, 
that shows more detail. 
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Recall that in a general 
PID, the gcd of two 
elements a and b is a 
generator d of the principal 
ideal consisting of all linear 
combinations of a and b\ in 
symbols, ( a,b ) = ( d ). 


Appendix A. 6 outlines a 
package for a CAS that 
allows you to calculate in 
ZH. 


1— to 
34+53a>)91+84a> 

87+72ft) 3- to 

4+ 12(1)^34+53(1) 

24+44a> 1+ to 
10+ 9<n^ 4+ 12d) 

1+10(1) 4 + oo 
3+ 2a. j 10+9 w 
10+ 9 to 

o~ 

So, we end with 3 + 2 tv. Repeated application of Exercise 8.6 on page 337 
shows that there is a chain of equalities of ideals: 

(91 + 84o>, 34 + 53 to) = (34 + 53 to, 4 + 12to) = (4 + 12o>, 10 + 9 to) 

= (10 +9 to, 3 + 2 to) = (3 + 2a.); 

that is, 

(91 + 84a), 34 + 53a.) = (3 + 2a.). 

Thus, 3 + 2co is a gcd of 91 + 84a> and 34 + 53a>. 

While the calculations are a bit tedious, you can work the four equations 
above backwards, as we did in Chapters 1 and 6, to write 3 + 2co as a linear 
combination of 91 + 84a) and 34 + 53co. Using a CAS, we found that 

3 + 2o) = (5 + 3o.)(91 + 84a.) - (9 + 2a>)(34 + 53a.) A 


How to Think About It. We’ ve seen, in Z [i ] or Z [a>] , that there are some- 
times choices for quotients and corresponding remainders in the generalized 
division algorithms. Hence, there may be more than one way to implement the 
Euclidean Algorithm and, so, more than one end result. But, thanks to Proposi- 
tion 6.45 and the fact that Z[/ ] and Z[co\ are PIDs, any two gcd’s are associates. 
See Exercise 8.7 below for an example. 


Exercises 

8.1 Prove or Disprove and Salvage if Possible. Two Gaussian integers are associates 
if and only if they have the same norm. 

8.2 How many possible numbers of “closest Gaussian integers” to a complex number 
are there? For each number, give an example. 

8.3 Let z and w be Gaussian integers, and suppose that q and q' are Gaussian integers 
equidistant from w/z in the complex plane. Show that 

w w , 

N(--q) = N(--q f ). 

z z 

8.4 Let z and w be Gaussian integers, and suppose that q and q f are Gaussian integers 
equidistant from w/z in the complex plane. Are w/z—q and w/zz—q' associates? 
If so, prove it; if not, give a counterexample. 
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8.5 If z 0 and w are Eisenstein integers, how many possible numbers of quotients 
w/z are there in Z[cu\ satisfying the conditions of the division algorithm? For 
each number, give an example. 

8.6 * Let R be a commutative ring. If a , b , c, and d are elements of R such that 
b = da + c, show that there is equality of ideals (b,a) = (a,c). 

8.7 If z = 6 + 12; and w = —13 + 74; , show that 



(i) Show that there are two q’s with w = qz + ;• in the generalized division 
algorithm, namely q = 4 + 3; and q = 5 + 3; . 

(ii) Apply the Euclidean Algorithm to find a gcd of z and w starting with q = 

4 + 3;. 

(iii) Apply the Euclidean Algorithm to find a gcd of z and w starting with q = 

5 + 3;. 

(iv) Are the two gcd’s associates in Z [;]? 

8.2 Primes Upstairs and Primes Downstairs 

We saw in the last chapter that an irreducible polynomial in k[x\ (for some 
field k ) may factor in K[x\ for some extension field K/ k. For example, x 2 + I 
is irreducible in R[x] but it factors in C[jc]. A similar phenomenon occurs for 
primes in Z. Every ring of cyclotomic integers R has Z as a subring, and a 
prime p e Z may factor in R. 

Our goal in this section is to investigate primes in Z [/] and in Z[co], and the 
obvious way to begin doing this is by studying primes downstairs , that is, in Z, 
and look at their behavior upstairs , that is, in rings of cyclotomic integers. 

Corollary 8.8, the Fundamental Theorem for Euclidean domains, tells us 
that every element in such a domain has an essentially unique factorization 
into primes. 

Lemma 8.10. Let R = Z [t, p \ for any prime p. Ifu e R, then u is a unitin R 
if and only if N (w) = 1. 

Proof If u is a unit, there is v e R with uv = 1. Hence, 1 = N(u v) = 
N(u)N(v). As N(u) and N(v) are positive integers, we must have N(u) = 
1 = N(v). 

Conversely, suppose that N(u) = 1. Since N(u) = liu , we have u a unit 
in R (with inverse IT). ■ 

The actual factorization of Gaussian integers or of Eisenstein integers into 
primes can be a tricky task, but here is a useful tool. 

Proposition 8.11. Let R = Z[i] or Z[a>], If z e R and N(z ) is prime in Z, 
then z is prime in R. 

Proof We prove the contrapositive of the statement of the Proposition. If z e 
R and z = w v for non-units w and v, then Lemma 8.10 gives N(w) > 1 and 
N(v) > 1. Hence, 

N(z) = N(wv) = N(w)N(v), 
and N(z) is not prime in Z. ■ 
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The converse of Proposition 8.11 is false; we’ll soon see that 7 is prime 
in Z [1], but that N (7) = 49. 


Example 8.12. (i) In Example 8.1, we divided w = —211 + 1021 by z = 

— 19+481 and got quotient 3 + 31 and remainder —10 + 151. Let’s carry 
out the rest of the Euclidean Algorithm to get a gcd of z and w . 

We use two formats for the Euclidean Algorithm finding the gcd. Here’s 
the southwestern version. 


w = (3 + 3 l)z + (-10+ 151) 

3 + 31 = (3 - 1)(— 10 + 151) + (-4 - 71) 

-10 + 151 = (-1 - 21 ) ( — 4 - 71). 

Hence, gcd(z, w) = 4 + 71 (we take an associate of —4 — 71). 

Here is a more detailed version of this calculation. 

3+ 31 

- 1 9+48 ij— 211 + 1021 

-201+ 871 3- 1 

-10+ 151 ) — 19+481 

-15+551 -1- 21 

-4- 71 )— 10+151 
-10+151 

o 

Again we see that gcd(z, w) = 4 + 71. So 4 + 71 is a factor of both z 
and iv. 

(ii) Is the gcd 4 + 71 prime? If not, can we factor it explicitly? Since 

N{4 + 71) = 65 = 13-5, 


we claim that the norm of any prime factor n of 4 + 71 is either 13 or 5. 
If 4 + 71 = jtoi, where a e Z[l] is not a unit, then N(a) > 1, by 
Lemma 8.10. Hence, as N( 4 + 71 ) = N(jc)N(a), we must have N(n) = 
13 or N(tt) = 5. Well, 5 is small enough to do a direct check, and the 
only Gaussian integers with norm 5 are 2 ± 1 and their associates. And 
we’re in luck; 2 + 1 is a factor: 


4 + 71 
2 + 1 


3 + 21. 


Note that N ( 3 + 21) = 13, so that 3 + 21 is prime, by Proposition 8.11. 
(iii) To factor w = —21 1 + 1021, divide by the gcd: 


- 211 + 1021 
4 + 71 


= -2 + 291. 


Since N(— 2 + 291) = 845 = 5 • 13 2 , the same process as in part (ii) 
shows that 


-2 + 291 = (2 + !)( 5 + 121) = (2 + !)(3 + 2 if. 




8.2 Primes Upstairs and Primes Downstairs 339 


Putting it all together, we have the prime factorization of w : 

w = -211 + 102 i 
= (4 + 71 ) (—2 + 29 0 
= (2 + r ')(3 + 2 0(2 + 0(3 + 2 / ) 2 
= (2 + i ) 2 (3 + 2 if. 

We leave it to you to find the prime factorization of z and to show that 
gcd(z, w) lcm(z, w) = zw. A 


In Section 3.4, we used the 
fact that 

5 + 12/ = (3 + 2/') 2 when 
we generated Pythagorean 
triples with Gaussian 
integers. 


How to Think About It. How do we factor a positive rational integer m 
into primes? First of all, there is an algorithm determining whether m is prime. 
Use the Division Algorithm to see whether 2 | m. If 2 \ m, use the Division 
Algorithm to see whether 3 | m. And so forth. Now if d is a divisor of m, 
then d < m, and so there are only finitely many candidates for divisors; hence, 
this process must stop. Of course, if we have any extra information about m, 
we may use it to cut down on the number of candidates. We must say that 
this algorithm is useful only for small numbers m; after all, the difficulty in 
factoring large numbers is the real reason that public key codes are secure. 

A variation of this algorithm can be used to factor nonzero Gaussian in- 
tegers. If d , w G Z[i] and d \ w, then N(d) < N(w ); hence, there are only 
finitely many Gaussian integers z which are candidates for being divisors of w. 
If N(w) is prime, then Proposition 8.11 says that w is prime; if N(z) is com- 
posite, we can proceed as in the last part of Example 8.12. 


Laws of Decomposition 

We now describe the primes in Z[i] (there will be a similar story for Z[w\). 
The next lemma lets us concentrate on how primes downstairs in Z behave 
when they are viewed as elements upstairs in Z [i ] . 

Notation. It gets tedious to keep saying “let p be a prime in Z.” From now on, 
let’s call primes in Z rational primes to distinguish them from primes in other 
rings. Remember that a prime (or irreducible) element in a commutative ring 
is one whose only divisors are units and associates. We may also say rational 
integer to distinguish an ordinary integer in Z from a Gaussian integer, an 
Eisenstein integer or, more generally, a cyclotomic integer. 

Lemma 8.13. Every prime n in Z[i] divides a rational prime. 

Proof. Every Gaussian integer z divides its norm in Z[i], for N(z) = zz. In 
particular, n divides a rational integer, namely its norm. Now N(ji) factors 
into primes in Z: 


nit = N(n) = pipi-.-Pk , 


and so 


7T | P 1 P 2 ■ ■ ■ Pk- The primes on the right- 

hand side are elements of 

But jt is a prime in Z [/]; hence, by Euclid’s Lemma in Z [/], it divides one of Z[i] as well as of Z. 
the (prime) factors pj on the right. ■ 
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Example 8.14. We have seen that n = 3 + 2i is a prime in Z [/]; note that jt 
divides 13, for 


N(n) = (3 + 2/)(3 — 2i) = 13. 

We sometimes say that 3 + 2/ lies above 13. ▲ 

As we said earlier, there are primes in Z that remain prime in Z [/] and 
others that factor into new primes; the same is true for Z[a>\. It turns out that 
there’s a beautiful theory, going back to Gauss, for how primes decompose in 
these rings, a theory that brings together many of the ideas you’ve studied so 
far. For example, here are some factorizations of rational primes when viewed 
as elements in Z [/ ] : 

5= (2 + /)(2-/); 13= (3 + 2/)(3-2/); 29 = (5 + 2/)(5 - 2/). 

In each of these cases, the rational prime decomposes as a norm: the product 
of a Gaussian integer and its conjugate. This is always the case. 

Lemma 8.15. Let p be a rational prime. If p is not prime in Z [/], then there 
exists some prime z in Z [/] with 

p = z~z = N(z). 

Proof. Suppose p = zw, where z and w are non-unit Gaussian integers. Then 
p 2 = N(p) = N(zw ) = N (z) N(w), 

And, in fact, z and w must where neither N(z) nor N(w) is 1. But this is an equation in Z, and so unique 

be associates. factorization in Z gives p = N(z ) = zz. Finally, z is prime in Z [/], by 

Proposition 8.11, because N(z) is a rational prime. ■ 

Lemma 8.15 narrows the situation quite a bit. It says that if a rational prime 
factors in Z [/], it factors into exactly two conjugate Gaussian integers, each 
prime in Z [/]. We say that such a rational prime splits in Z [/ ] . We can state 
the result of Lemma 8.15 using only the arithmetic of Z. Since N(a + bi) = 
a 2 + b 2 , the lemma says that a prime splits if it can be written as the sum of 
two perfect squares. And the converse is also true. 

Proposition 8.16. A rational prime p splits in Z [/ ] if and only if p is a sum of 
two squares in Z. 

Proof. If p splits in Z [/]. then Lemma 8.15 says that there is a Gaussian integer 
z = a + bi such that p = N(z) = a 2 + b 2 . 

Conversely, if p = a 2 + b 2 , then p = (a + bi )(a — bi). But a + bi is prime 
in Z [/], by Proposition 8.11, because its norm, N(a + bi) = p, is a rational 
prime. ■ 

The question of which rational primes split in Z [/ ] thus comes down to the 
question of which primes are sums of two squares. Not every rational prime is 
a sum of two squares; for example, it’s easy to see that 1 1 is not. Here is a nice 
(and perhaps surprising) connection to modular arithmetic. A quick example 
gives the idea. The prime 29 is the sum of two squares: 

29 = 2 2 + 5 2 . 
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As an equation in F29, this says that 2 2 + 5 2 = 0. Multiply both sides by 6 2 , 
for 6 = 5 _1 in F29: 

0 = 2 2 6 2 + 5 2 6 2 
= (2 • 6) 2 + (5 • 6) 2 
= (2 • 6) 2 + 1 , 

so that 2 • 6 = 12 is a root of x 2 + 1 in F29. More generally, suppose that p 
is a prime and p = a 2 + b 2 . We can assume that 0 < a, b < p, so that both 
a and b are units in ¥ p . We can write this as an equation: a 2 + b 2 = 0 in F p . 
Multiplying both sides by (b~ l \ 2 , we get: 

(ab^ 1 ) 2 +1 = 0; 

that is, ab~ x is a root of x 2 + 1 in ¥ p . And the converse is true as well: 

Proposition 8.17. A rational prime p is a sum of two squares if and only if 
x 2 + 1 has a root in ¥ p . 

Proof. We’ve just seen that an expression of p as the sum of two squares leads 
to a root of x 2 + 1 in F p . 

Going the other way, suppose there is an integer n whose congruence class 
satisfies 

« 2 + 1 = 0 in ¥ p . 

Then, moving back to Z, we see that n satisfies 

P I (« 2 + 1). 

Now go upstairs to Z [/]. We have 

p | (n + i)(n - i). 

But p divides neither n + i nor n — i in Z [/] (otherwise, n/ p ± // p would 
be Gaussian integers). Euclid’s Lemma says that p is not prime in Z[z] and, 
hence, by Lemma 8.15, p is the norm of some Gaussian integer z — that is, p 
is a sum of two squares. ■ 

Corollary 8.18. A rational prime p factors in Z [/] if and only if x 2 + 1 has a 
root in F p . 

Proof Apply Proposition 8.17 and the Factor Theorem (Corollary 6.15). ■ 
Let’s summarize these various equivalent statements about a rational prime p. 

• p factors in Z [z ] . 

• p = N(z ) for some z in Z[/]. 

• p = a 2 + b 2 inZ. 

• x 2 + 1 has a root in ¥ p . 

The last criterion may seem the most remote, but it is actually the easiest to 
use — you have to check at most (p — l)/2 possible solutions to x 2 + 1 = 0 
(because if a is a solution, so is —a). If you try a few numerical cases, a pat- 
tern begins to emerge — the primes satisfying the last criterion all seem to be 
congruent to 1 mod 4. That’s quite a beautiful and elegant result, which adds 
one more equivalent statement to the summarizing list above. 


Recall that is another 
notation for X p , the field of 
integers modulo p. 


Another way to say this: p 
is the sum of two squares if 
and only if -1 is a square 
in F p . 


So, p factors in Z [/'] if and 
only if x 2 + 1 factors in 
F p [x]. 
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See Exercise 6.38 on 
page 263. 


The name of a theorem 
may not coincide with the 
name of the first person 
who proved it. 


Are there other primes in 
Z[i] that are associate 
to their conjugates? That 
question is Exercise 8.10 
on page 343. 


Theorem 8.19. If p is an odd prime , then x 2 + 1 has a root in F /; if and only 
if p = 1 mod 4. 

Proof. Assume that p = 4k + 1. Since p is prime. Theorem 4.9 (Fermat’s 
Little Theorem) gives a p ~ x = 1 for all nonzero a in F /; . Thus, we have the 
factorization in [x] : 

x p ~ l — 1 = (x — l)(x — 2)(x — 3) . . . (x — (p — 1)) . (8.2) 

Hence, 

x p ~ l - 1 = x 4k - 1 
= (* 4 )* - 1 

= a- - 1) [M‘-‘ + M‘- 2 + + • ■ • + 1] 

by Exercise 6.47 on page 269 

= (. * 2 + 1 ) [(* 2 - i ) (*<)*-' + ( xf ~ 2 + ( xf ~ 3 + 

= (x 2 + l)/;(x). 

Comparing Eqs. (8.2) and (8.3), the two factorizations of x p ~ l — 1, and using 
unique factorization in F p [x], we see that x 2 + 1 = (x — a)(x — ft) for some 
a, f € F p . 

Conversely, if p is odd and p ^ 1 mod 4, then p = 3 mod 4 (it can’t be 
congruent to 0 or 2). But, by Proposition 8.17, if x 2 + 1 has a root in ¥ p , then 
p is the sum of two squares in Z. However, the sum of two squares in Z is 
never congruent to 3 mod 4: If a = 0, 1, 2, 3, then a 2 = 0, 1, 0, 1 mod 4, and 
so a 2 + b 2 = 0, 1, 2 mod 4. ■ 

Propostion 8.16, when combined with Theorem 8.19, gives us a nice fact of 
arithmetic, first established by Gauss. 

Corollary 8.20 (Fermat’s Two-Square Theorem). An odd rational prime p 
is a sum of two squares if and only if p = 1 mod 4. 

Theorem 8.19 tells the whole story for odd primes: primes that are congru- 
ent to 1 mod 4 split into two conjugate factors, and primes that are congruent to 
3 mod 4 stay prime (we call primes downstairs that stay prime upstairs inert). 
There is one prime we haven’t yet considered: p = 2. Now 2 factors in Z[i], 
because x 2 + 1 factors in Z 2 [x], In fact 

2 = 0 + 0 ( 1 - 0 . 

But note that these two factors are associates: 

1 + i = /(l — i ), 

and so 

2 = i(l -0 2 . 

Thus, 2 splits in a special way: it is associate to the square of a prime. We say 
that 2 ramifies in Z\i], Hence, our discussion gives a complete classification 
of how rational primes decompose in the Gaussian integers. 


-*'] 

(8.3) 
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Theorem 8.21 (Law of Decomposition in Gaussian Integers). Every ratio- 
nal prime p decomposes in Z[i] in one of three ways. 

(1) p splits into two conjugate prime factors if p = 1 mod 4 

(2) p is inert if p = 3 mod 4 

(3) p = 2 ramifies: 2 = / ( 1 — i) 2 . 

Corollary 8.22 (Classification of Gaussian Primes). The primes it in Z [/] 

are of three types: 

(1) it = a + bi, which lies above a rational prime p with p = 1 mod 4; in 
this case , N{ n) = a 1 + b 2 

(2) n = p, where p is a rational prime with p = 3 mod 4; in this case, 
N{p) = p 2 

(3) 7i = 1 — i and its associates', in this case, (V(l — i ) = 2. 

Proof Now 7T divides some rational prime p, by Lemma 8.1 3. If p = 1 mod 4, 
then it is of the first type; if/? = 3 mod 4, then p is inert and Tt = p; if 
p = 2 mod 4, then p ramifies and Tt = 1 — i . ■ 

As we mentioned earlier, 7 is a prime in Z[i], for 7 = 3 mod 4, but its 
norm 49 is not a rational prime. Thus, the converse of Proposition 8.11 is false. 


How to Think About It. The fact that 2 = i (1 — i) 2 can be stated in terms 
of ideals in Z[i]: there is equality of ideals 

(2) = ((1 - if) • 

In fact, if we use the definition of the product of ideals from Exercise 5.51 on 
page 220, the above equation of ideals can be written as 

(2) = (1 — i) (1 — 0 = (1 — i) 2 ■ 


Exercises 

8.8 (i) In Example 8.12 we found a gcd of z = —19 + 48 i and w = —211 + 102/ 

to be 4 + 7/ . Write 4 + 7/ as a linear combination of z and w. 

(ii) Use part (i) to find the prime factorization of z. 

(iii) Show that gcd(z, w ) lcm(z, w) = zw. 

8.9 Show that if two Gaussian integers z and w have relatively prime norms in Z, 
then z and w are relatively prime in Z [/]. Is the converse true? 

8.10 * Which primes in Z [/ ] are associate to their conjugates? 

8.11 How many non-associate primes in Z[i] lie above 5? 

8.12 In Z [/], show that every associate of a + hi is conjugate to an associate of b + ai . 

8.13 Show that every Gaussian integer is associate to one in the first quadrant of the 
complex plane. (We define the first quadrant to include the nonnegative x-axis but 
not the positive y-axis.) 

8.14 Show that if two integers a and b can each be written as the sum of two squares, 
so can ab. 
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Try some other primes 
in Z[co], Any conjectures 
about which ones split? 


8.15 Factor each of these into primes in Z[i], 

(i) 101 (ii) 31 (iii) 37 

(iv) 7 + 4 i (v) 8 + i (vi) 65 

(vii) 7 + 3i (viii) 40 + 421 (ix) 154 + 4141 

8.16 Find the number of elements in each of the quotient rings. 

(i) Z [1 ] / ( 1 — 1 ) (ii) Z [1 ] / (2+1) 

(iii) Z[l]/(3 + 21) (iv) Z[l]/(5+121) 

8.17 Take It Further. If z is a Gaussian integer, show that 

I Z [1 ] / (z)| = N(z). 


Eisenstein Primes 

The whole theory just given for Z[i] carries over to Z[a>\. Of course, the state- 
ments have to be modified slightly, but the proofs are almost identical to the 
corresponding results in Z [1 ] . If you think about it, this shouldn’t be a surprise: 
a proof using only algebraic properties of norm (for example, it is multiplica- 
tive) and properties of PIDs (unique factorization and Euclid’s Lemma) should 
carry over mutatis mutandis. 

We summarize the results for Eisenstein integers, providing sketches of 
proofs where we think it’s necessary, but we leave the details to you. And 
these are important exercises, because they will help you digest the ideas in 
both rings. 

Lemma 8.23. Every prime in Z[m] divides a rational prime. 

Proof. Mimic the proof of Lemma 8.13. ■ 

How about a law of decomposition for Eisenstein integers? Some rational 
primes factor in Z[<w]; for example: 

7 = (3 + <») (3 + « 2 ); 31 = (5-co)(5- « 2 ); 97 = (3 - 8cu)(3 - 8m 2 ). 

In each of these cases, the prime in Z decomposes in Z[«] into a norm: the 
product of an Eisenstein integer and its conjugate. This is always the case. 

Lemma 8.24. Let p be a rational prime. If p is not prime in Z[<w], then p = 
zz = N (z) for some prime z in Z [a>]. 

Proof. Mimic the proof of Lemma 8.15. ■ 

As happened in Z[i], we can restate the result of Lemma 8.24 completely 
in terms of the arithmetic of Z. Since N(a + ba>) = a 2 — ab + b 2 , the lemma 
says that a prime splits if it can be written in this form. The converse is also 
true. 

Proposition 8.25. A rational prime p splits in Z [o>] if and only if p can be 
expressed as a 2 — ab + b 2 for integers a,b € Z. 


Proof. Mimic the proof of Lemma 8.16. ■ 
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Proposition 8.17 says that a rational prime p is a norm of some prime in 
Z[i] if and only if x 2 + 1 has its roots in Fp. What might be an analog of this 
result for Z[a>]? Well, x 2 + 1 is the minimal polynomial for /; the minimal 
polynomial for co is x 2 + x + 1 . A careful look at the proof of Proposition 8. 17 
shows that we can use x 2 + x + 1 and modify the proof slightly to obtain 
another lovely result. 


Proposition 8.26. A rational prime p can be expressed as a 2 — ab + b 2 for 
integers a, b € Z if and only ifx 2 + x + 1 has a root in Fp. 


Proof. Suppose that p = a 2 — ab + b 2 for integers a and b. Then p doesn’t 
divide either a or b (otherwise p 2 | ( a 2 — ab + b 2 )) and, hence, b is a unit in 
Fp. Multiply both sides by ( b to obtain 

p ( b~ l ) 2 = (— ab~ 1 ) 2 + (—ab~ 1 ) + 1, 


so that —a (b x ) is a root of x 2 + x + 1 in Fp. 

Going the other way, suppose that there is a congruence class [m\ e Fp with 

[in] 2 + [in] +1 = 0 inFp. 


Then, in Z, we have 
that is, 


in 2 + in + 1 = 0 mod p; 
p | (in 2 + m + 1). 


Now move up to Z [&>]. We have 

p | (in — w)(m — co 2 ), 
or 

p | (in — co)(m + 1 + co). 

But p doesn’t divide either in — co or in + 1 + co in Z[a>] (otherwise ^ — jco 
or + 1 co would be an Eisenstein integer). Thus, by Euclid’s Lemma, p is 

not prime in Z[co] and hence by Lemma 8.24, p is the norm of an Eisenstein 
integer a + bco; that is, p = a 2 — ab + b 2 for integers a and b. ■ 


We get the next corollary. 


Corollary 8.27. A rational prime p factors in Z[co] if and only ifx 2 + x + 1 
has a root in Fp. 


Proof. Apply Proposition 8.26 and the Factor Theorem (Corollary 6.15). ■ 
We summarize the chain of equivalent statements. 

• p factors in Z [co] . 

• p = N(z) for some z in Z[co\. 

• p = a 2 — ab + b 2 in Z. 

• x 2 + x + 1 has a root in Fp. 

Onward to a law of decomposition in Z [co]. Numerical experiments (we 
hope you’ll try some) suggest that if p is a rational prime and p = 1 mod 3, 
then x 2 + x + 1 has a root in Fp . The proof of Theorem 8.19 suggests a reason 
why. 


What is the discriminant of 

X 2 + X + 1? 


Check that 

(m — cd)(m — co 2 ) = 
m 2 + m + 1. 


So, p factors in Z[co] if and 
only if x 2 + x + 1 factors 
in Fp[x], 
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See Exercise 6.38 on 
page 263. 


Recall that the units in 
Z[w] are ±1, ±co, and 
±co 2 . 

Are there other primes in 
ZM that are associate 
to their conjugates? That 
question is Exercise 8.18 
on page 349. 

In terms of ideals, (3) = 
((1 — co) 2 ) = (1 — co) 2 . 


Proposition 8.28. If p is a prime and p = 1 mod 3, then x 2 + x + 1 has a 
root in Fp. 

Proof. Suppose that p = 3 k + 1. Because p is prime, a p ~ l = 1 for all non- 
zero a in F p (Fermat’s Little Theorem — Theorem 4.9). Hence, in Fp[x], we 
have the factorization 

x p ^ 1 — 1 = )x — l)(x — 2)(x — 3) • • • (x — )p — 1)) . (8.4) 


But 

x p ~ x - l = x 3k - l 

= (* 3 )*-l 

= (x 3 - 1) [(x 3 )^ 1 + (x 3 ) k ~ 2 + (x 3 ) k ~ 3 + • • • + l] 

Exercise 6.47 on page 269 

= (x 2 + x + 1) [(x - 1) (x 3 ) k 1 +(x 3 ) k 2 + {x 3 ) k 3 -l l-l] 

= ( x 2 + x + 1 )h)x). (8.5) 

Comparing Eqs. (8.4) and (8.5), the two factorizations of x p ~ l — 1, and using 
unique factorization in Fp[x], we see that x 2 + x + 1 = )x — a))x — ft) for 
some a,j8 £ F p ; that is, x 2 + x + 1 has a root in ¥ p . ■ 

What about primes that are not congruent to 1 mod 3? One case is easily 
handled. Suppose that p = 2 mod 3. By Proposition 8.26, x 2 + x + 1 has a 
root in Fp if and only if p can be written as a 2 —ab + b 2 for a, b G Z. But you 
can check that, for any choice of a and b,a 2 — ab + b 2 is never congruent to 2 
mod 3 (just look at the possible congruence classes of a and b mod 3). Thus, 
p is inert; that is, p is prime in Z[a>\. There is only one more prime, namely 3, 
the prime congruent to 0 mod 3. And x 2 + x + 1 certainly has roots in F 3 , 
namely 0 and 1 . Therefore, 3 must split; in fact, 

3 = (2 + to)(2 + co 2 ) = (2 + co)( 1 — co). 

But the important thing is that the two factors on the right are associates. You 
can check that 

—co 2 ) 1 — co) = 2 + co. 


So, our factorization of 3 can be written as 

3 = — co 2 (l — co) 2 , 

and 3 is a ramified prime. Putting it all together, we have the law of decompo- 
sition in Z[coJ as well as a description of all Eisenstein primes. 

Theorem 8.29 (Law of Decomposition in Eisenstein Integers). Every ratio- 
nal prime p decomposes in Z[co\ in one of three ways. 

(1) p splits into two conjugate prime factors if p = 1 mod 3 

(2) p is inert if p = 2 mod 3 

(3) 3 ramifies into —co 2 ) 1 — co) 2 . 
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Corollary 8.30 (Classification of the Eisenstein primes). The primes n in 
Z [to] are of three types'. 

(1) it = a + bco which lies above a rational prime congruent to 1 mod 3; in 
this case, N(n) = a 2 — ab + b 2 . 

(2) primes p in Z that are congruent to 2 mod 3; in this case, N(p) = p 2 . 

(3) the prime 1 — u> and its associates', in this case, N( 1 — to) = 3. 

The relation between the factorization of a prime p in Z [co\ and the factor- 
ization of x 2 + x + 1 in [x] can be used to factor Eisenstein integers. 


Example 8.31. (i) Let p = 31. There are two roots of x 2 + x + 1 in F 31 , 

namely 5 and 25, and so x 2 + x + 1 = ( x — 5)(x — 25) in F 31 |x]. Lift 
this equation to Z[x]: 

x 2 + x + 1 = (x — 5)(x — 25) + 3 1 (x — 4). 

So, letting x = co, we have 

(5 — at) (25 — to) = 31(4 — to). 

Now N(5 — co) = 31, so 5 — co is a prime factor of 31, and the other is 
25 — to 

— =6 + a>. 

4 — 0) 

(ii) Let p = 97. There are two roots of x 2 + x + 1 in F 97 , namely 61 and 35. 
In fact, 

x 2 + x+ 1 = ( x - 61)(x - 35) + 97 (x - 22). 

Letting x = 0 ), 

(61 - m)(35 -co) = 97(22 - to), 

and so 

(61 — m)(35 — w) 

22 — 0 ) 

Now, N(22—o)) = 507 = 3- 13 2 ; since N(l-co) = 3andJV(4+w) = 13, 
so (checking for unit factors), we have 

22 — o) = (1 — u))( 4 + to) 2 . 

Some of these factors divide 61 — 0 ); the rest divide 35 — co. We have 

N(6l-to) = 3-13-97 
N(35 — co) = 13-97. 


We can cancel the factor of 13 by dividing by 4 + co; it’s easier to work 
with 35 — m: 

3 5 — 0 ) (35 — ft)) (4 + co 2 ) 

= — = 8-3 00 . 

4 + 0) 13 

Bingo: N(8 — 3 to) = 97, so that 

97 = (8 - 3ftt)(8 - 3 co 2 ). ▲ 


This example connects to 
Exercise 8.24 on page 349. 
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The result of this example 
will be useful in the next 
section. 


How to Think About It. Because Z[i] and Z[co] are commutative rings, we 
can construct quotient rings. And, since both rings are PIDs, they often look 
very similar to rings we have already met. 


Example 8.32. We investigate the quotient ring R = Z[<w]/(A), where 

X = 1 — w 

is the prime lying over the rational (and ramified) prime 3. For any Eisenstein 
integer z, let’s look at the remainder after dividing z by A. Proposition 8.4 gives 
Eisenstein integers q and r such that 

z = qX + r, with r = 0 or N{r ) < N{ A). 

Now N(X) = 3, so that N(r ) must be 0, 1, or 2. There are no Eisenstein 
integers of norm 2, because 2 is inert in Z[a>]. Hence N(r) is 0 or 1. If N(r) = 
0, then r = 0; if N(r) = 1, then r is a unit in Z[cd]. So, aside from 0, we 
need only investigate the six Eisenstein units. It turns out that each of these is 
congruent to 1 or —1 modulo A: 


If r = 

1, 

then z = 

II 

+ 

1 and z 

1 mod A . 


If r = 

-1, 

then z = 

= qX 

- 

i and z = 

— 1 mod A . 


If r = 

CO, 

then z = 

II 

+ 

co = (q - 

1)A + 1 and z 

= 1 mod A . 

If r = 

—CO, 

then z = 

= qX 

- 

co = (q + 

1 )A — 1 and z 

= — 1 mod A . 

If r = 

CO 2 , 

then z = 

II 

+ 

co 2 = (q- 

+ 

'S' 

1 

1 

and z = 1 mod A . 

If r = 

-CO 2 , 

then z = 

= qX 

- 

co 2 = (q + 1 4- co) X — 1 

and z = — 1 mod A 


So, every element of Z[co] is congruent mod A to one of 0, 1, or —1. This 
suggests that Z[co\/(X) is none other than our friend F 3 . And, in fact that’s 
true. 

Proposition 8.33. If X = 1 — co, then 

ZM/(A) s f 3 . 

Proof. By Proposition 7. 13, the quotient ring Z[&)]/(A) is a field, while Exam- 
ple 8.32 shows that the field has exactly 3 elements. Therefore, Z[cu]/(A) = 
F 3 , for Corollary 7.40 says that two finite fields with the same number of ele- 
ments are isomorphic. ■ 

The results in this section just scratch the surface, for life is more compli- 
cated; there are rings of cyclotomic integers that are not PIDs. We shall have 
more to say about this when we discuss the work of Kummer. 


Further Results. The laws of decomposition for Z[z] (Theorem 8.21) and 
Z[co\ (Theorem 8.29) show that the decomposition of a rational prime depends 
only on its congruence class modulo a fixed integer: 4 for Z [r] and 3 for Z[a>]. 
This theory was greatly generalized in the twentieth century to Class Field 
Theory, which determines laws of decomposition of primes in rings of cy- 
clotomic integers, thereby bringing together under one roof many of the main 
ideas in modern algebra. 
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Exercises 

8.18 * Which primes in Z [co] are associate to their conjugates? 

8.19 For which primes p is x 2 + x + 1 a perfect square in ¥ p [x] ? 

8.20 Working in Z[co], under what conditions are a + bco and b + aco associates? 

8.21 In Z[a>], 

(i) What are all the associates of the prime 1 — w? 

(ii) Show that 1 — co and 1 — co 2 are associates. 

(iii) Write (1 — w)(l — co 2 ) as a + bco. 

(iv) What is the minimal polynomial of 1 — w? 

8.22 If z, w, v are elements of Z [co] (or Z [/']), show that 

(i) If z | w, then z [ w. 

(ii) If z | w, then N(z) \ N(w) in Z. 

(iii) If z = w mod v then z = uJ mod tJ. 

8.23 Show that a rational prime p splits in Z[co\ if and only if —3 is a square mod p. 

8.24 Show there are isomorphisms of commutative rings, 

(i) Z[i] ^ Z[.v]/(.\- 2 + 1). 

(ii) Z [co\ ^ Z[x]/(x 2 + x + 1). 

8.25 * Find all units u in Z[co] such that it = 1 mod 3. 

8.26 Factor into primes in Z[co\. 

(i) 301 (ii) 307 (iii) 5 + 8co 

(iv) 5 + co (v) 19+ 18 m (vi) 39 + 55 m 

(vii) 61 — co (viii) 62+ 149 m (ix) 87 — 62w 

8.27 Find the number of elements in 

(i) Z[co\/ (2 + co) (ii) Z[w\/ (4 - co) 

(iii) Z[a>]/(6 + w) (iv) Z[u)]/(31) 

8.28 Take It Further. If z is an Eisenstein integer, show that 

|ZM/(z)| = N(z). 


Note that x 2 + x + 1 = 
(x — co) (x — co 2 ). What 
happens if you put x = 1? 


Note that 3 is a unit times 
(1 -co) 2 . 


8.3 Fermat’s Last Theorem for Exponent 3 

The goal of this section is to prove Fermat’s Last Theorem for exponent 3: 
there are no positive integers x, y , z satisfying x 3 + y 3 = z 3 . The earliest 
proof is attributed to Euler [12] in 1770 (his proof has a gap that was eventually 
closed). We develop a different proof in this section that is a nice application 
of the arithmetic in Z [m] . 


How to Think About It. The development of the proof is quite technical 
(we’ve polished it as much as we were able), but the essential idea is straight- 
forward and has already been mentioned several times. It’s based on the fac- 
torization of x 3 + y 3 in Z[&>] (see Exercise 3.50 on page 1 15): 


x 3 + y 3 = (x + y)(x + vco)(x + vco 2 ). 
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If there are positive integers x, y,z with x 3 + y 3 = z 3 , then we’d have a 
factorization of z 3 in Z[a>\: 

z 3 = (x + y)(x + yco)(x + yco 2 ). 

The primes dividing z all show up with exponent at least 3 in z 3 , and the idea is 
to show that this can’t happen on the right-hand side. Heuristically, if the three 
factors on the right are relatively prime and none is divisible by the square of 
a prime, we’re done. But it’s not so easy, mainly because of some mischief 
caused by A = 1 — a>, the prime lying above 3. So, pull out your pencil again 
and follow along. 


Preliminaries 

Our development will often make use of a fact about Z[co\ adapted from Corol- 
lary 6.37. 

Proposition 8.34. If x and y are rational integers that are relatively prime 
in Z, then they are relatively prime in Z[w\. 

Proof. If n is a prime in Z[co\ dividing both x and y, then N{n) \ N(x) and 
N (jt) | N(y). That is, N(n) \ x 2 and N(ji ) | y 1 (for both x and y lie in Z). 
See Exercise 8.22 on Now N(jt) e Z; hence, if p is a prime factor of then p is a common 

page 349 . factor of x and y, a contradiction. ■ 

The prime X = 1 — a> will figure prominently in the story. In Theorem 8.29, 
we saw that X | 3 and, in fact, 

3 = —w 2 X 2 . (8.6) 

That X lies above 3 implies that a rational integer divisible by X in Z[a>] is di- 
visible by 3 in Z. The next lemma explains the ubiquity of A in the forthcoming 
proofs. 

Lemma 8.35. If x is a rational integer, then 3 | x in Z if and only if X \ x 
in Z[oj\. 

Proof If 3 | x in Z, then Eq. (8.6) shows that A | 3 in Z[w\, hence, A | x in 
Z[o>\. 

Conversely, if A | x, then Exercise 8.22 on page 349 shows that N( A) | 
N{x) in Z. But N( A) = 3 and N (x) = x 2 , so that 3 | x 2 in Z and hence 
Euclid’s Lemma gives 3 | x in Z. ■ 

In Example 8.32, we saw that every element in Z[a>] is congruent mod A 
to 0, 1, or — 1. We’ll often need to know “how congruent” an Eisenstein integer 
a is to one of these; that is, whether a is divisible by a power of A . We introduce 
notation to capture this idea. 

Definition. Define a function v:Z[a>] — {0} — > N as follows: if z e Z[w\ is 
nonzero and n > 0 is the largest integer with A" | z, then v(z) = n. We call v 
the valuation. 
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Some treatments define 
v(0) to be oo, but we 
won’t do that here. Also, a 
valuation can be defined 
in an analogous way for 
any prime q in a UFD; just 
replace A by q. 


Proposition 8.36. If z, w are nonzero elements ofZ[co], then 

(i) v(zw) = v(z ) + v(w). 

(ii) Ifn is a nonnegative integer, then v(z n ) = n v(z). 

(iii) v(z ± w) > min{v(z), v(ut)} and ifv(z) v(w) then 

v(z ± w) = min{v(z), v(u;)}. 
Proof. This is Exercise 8.30 on page 358. ■ 


Thus, v(z ) is the exponent of the highest power of A dividing z: 

v(z) = n if and only if z = A"z' and A \ z' . 

Put another way, X v ^ \ z and \ v ( z l +l \ z. 

For example, v(A) = 1, v{w) = 0, and v(3) = 2; in Example 8.31, we saw 
that v(61 — co) = 1. Indeed, v(u) = 0 for every unit u. 

The valuation v enjoys some properties that come from the properties of 
exponentiation. The next proposition reminds us of Exercise 2.15 on page 59. 


How to Think About It. Most proofs of the theorem for exponent 3 are 
broken into two parts: the first case in which 3 doesn’t divide x, y, or z, and 
the second case in which 3 does divide one of them. We’ll follow this program 
and treat the two cases in turn. There are many proofs in the literature; our 
proof of the first case is not the easiest (see Exercise 8.31 on page 358 for a 
fairly simple alternative approach), but we choose it because it generalizes to 
a proof of the first case for any odd prime exponent p when Z[£ p ] has unique 
factorization (see Chapter 1 of [36] for the details). Our proof of the second 
case is based on the development in Chapter 17 of [ 17]. 


The First Case 

The main result of this section is that there are no positive integers x, y, z with 
gcd(x, y) = 1 and 3 \ xyz such that 

x 3 + y 3 = z 3 . 

Assuming x and y are relatively prime is no loss in generality: a prime factor 
of x and y is a prime factor q of z, both sides can be divided by q 3 , preserv- 
ing the relationship; hence, infinite descent would apply. The proof will be by 
contradiction, and it will depend on the following lemma. 

As we said on page 349, 
this relatively prime condi- 
tion leads fairly directly to 
the desired proof. 


Lemma 8.37. If x, y, and z are integers such that 3 \ xyz, gcd(x, y) = 1, 
and x 3 + V 3 = z 3 , then the Eisenstein integers 

x + y, x + coy, x + co 2 y 

are pairwise relatively prime in Z[w], 


Proof. Suppose that n is a prime in Z [w\ that divides two of the three integers, 
say 

7 r | x + co' y and n \ x + co J y. 
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where 0 <i <j < 2. Then n divides the difference 

(x + (o' y ) — (x + cu 7 y) = (o' y (1 — a*- 7- ' ) 

But, by Exercise 8.21(ii) on page 349, 1 — ft) 7- ' is an associate of A = 1 — <o, 
so that it divides 


uy (1 — (o ) , 

where u is some unit in Z[a>], Hence, by Euclid’s Lemma, n \ y or n = 
1 — (o = A. Similarly, 

t r | of (x + (o‘ y) — &>' (x + ft> 7 y) = (o' x (c w 7- ‘ — l) . 

And, because ft) 7- ' — 1 and 1 — ft) are associates, we have tt | x or 7 T = 1 — co = 
A. Hence, if tt 7 ^ A, then 7 r | x and 7 T | y. This implies that x and y have a 
common factor in Z [ft)]; thus, by Proposition 8.34, they have a common factor 
in Z, contradicting the assumption that x and y are relatively prime. So 7 T = A. 
Because x + y — (x + (o' y) = y( 1 — (o') = 0 mod A, 

x + y = x + (o' y mod A . 

We are assuming that x + (o' y = 0 mod A; thus. Lemma 8.35 implies that 
x + y = 0 mod 3 in Z. But then, 

z 3 = x 3 + y 3 

= x + y mod 3 (Fermat’s Little Theorem) 

= 0 mod 3; 

that is, 3 | z, which contradicts the hypothesis 3 \ xyz. ■ 

The hard work is done. 

Proposition 8.38 (First Case for Exponent 3). There are no positive integers 
x, y, z with gcd(x, y) = 1 and 3 \ xyz such that 

x 3 + y 3 = z 3 . (8.7) 

Proof. Suppose, on the contrary, that we have positive integers x, y, z as in the 
statement. Factoring the left-hand side of Eq. (8.7), we have 

(x + y)(x + ft>y)(x + a> 2 y) = z 3 . 

Lemma 8.37 guarantees that the three factors on the left-hand side are relatively 
prime. Hence, by unique factorization in Z[ft>], each is a unit times a cube in 
that ring (if ft is a prime divisor of any factor, say ft \ x + (oy , then ft \ z 3 and, 
by Euclid’s Lemma, ft \ z. Hence, ft 3 \ (x + toy): there exists an Eisenstein 
integer s such that 

x + toy = ± (o' s 3 , 

where ±ft)' is one of the six units in Z [ft)] (z is 0, 1, or 2). We want to look at 
the equation mod 3. Suppose that s = a + bco with a.b e Z. Then 

s 3 = a 3 + 3ct 2 b(o + 3ab 2 to + b 3 a> 3 
= a 3 + 3a 2 b(o + 3ab 2 co + b 3 
= a 3 + b 3 mod 3; 
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hence, 

x + coy = ±co'n mod 3, (8.8) 

where n is a rational integer. 

It follows (see Exercise 8.22 on page 349) that 

x + c oy = ±ftJ' n mod 3. 

But ft) = ft) -1 , so that 

x + ft> -1 }) = ±ft>~'« mod 3. (8.9) 

Eqs. (8.8) and (8.9) can be rewritten as: 

ft) - ' (x + coy) = ±77 mod 3 
cu' (x + ft> -1 v) = ±7? mod 3; 

hence, 

ft) - ' (x + coy) = cu' (x + ft> -1 })) mod 3. 

Multiplying by co l gives 

x ± coy = cu 2 ' (x + ft> -1 y) mod 3, 

and so 


x + coy — ft) 2 ' x — ft) 2 ' V^OmodS. (8.10) 

We claim, for each possible value of / , namely 0, 1, or 2, that Eq. (8.10) leads 
to a contradiction. 

(i) i = 0: Eq. (8.10) becomes 

x + coy — x — ft) -1 y = 0 mod 3; 

that is, 

^ft) ^ y = 0 mod 3. 

Multiplying both sides by co 2 gives 

(1 — ft>) y = 0 mod 3; 

that is, there is some a e Z[ft>] with Ay = 3a. But 3 = — co 2 X 2 , by 
Eq. (8.6), so canceling A gives y = —co 2 X a. Thus, A | y in Z [&>], so that 
3 | y in Z, by Lemma 8.35. This contradicts the hypothesis 3 \ xyz. 

(ii) i = 1: Eq. (8.10) becomes 

x ± coy — w 2 x — ft)y = 0 mod 3. 

Thus, the cny’s drop out, and there is a e Z[ft>] with x(l — co 2 ) = 3a. 
But 1 — cu 2 = i A, by Exercise 8.21 on page 349, and so xi A = — co 2 \ 2 a. 
Hence, A | x in Z [cu], and Lemma 8.35 gives 3 | x in Z, another contra- 
diction. 


Recall that a = ft mod 3 
means that there is 
S e Z [&>] with 3 S = a — fi. 
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This is another example 
where it’s easier to do 
things in more generality. 
The reason for introducing 
u comes from the fact that 
we allow x, y, and z to 
be elements of ZM, so 
arithmetic statements are 
true up to unit factors. 


(iii) i = 2: Eq. (8.10) becomes 

x + coy — co 4 x — co 3 y = 0 mod 3. 

The left-hand side simplifies to x — cox = Ax, because co 3 = 1. As in 
parts (i) and (ii), this leads to 3 | xyz, which is a contradiction. 

We conclude that there is no solution to x 3 + y 3 = z 3 of the desired type. ■ 

Gauss’s Proof of the Second Case 

Gauss gave an elegant proof of the second case of Fermat’s Last Theorem for 
exponent 3, and we’ll present it here. It turns out to be convenient to prove a 
more general result. The object of this section is to prove the following theo- 
rem. 

There are no Eisenstein integers u , x, y, z with xyz ^ 0, u a unit, and 3 a 
factor of exactly one ofx , y, z, such that 

T T -X 

x + y = uz . 

The proof, which will use infinite descent, is a consequence of several lem- 
mas and propositions. To start with, we can assume, for u , x, y, z as in the 
statement, that x, y, and z are not all divisible in Z[co\ by A = 1 — co (otherwise 
there’s a contradiction, for 3 is a divisor in Z of x, y, and z, by Lemma 8.35). 
We’ll first prove the theorem in case X \ xy but X \ z. Since x and y are in- 
terchangeable in the hypothesis, the remaining case is X \ yz but X \ x. We’ll 
see that the theorem is an easy consequence of this. 

Example 8.32 shows that every Eisenstein integer a is congruent mod X to 
0, 1, or —1. In particular, if A \ a, then 

a = ± 1 mod X . 

Gauss’s proof requires a lemma that shows how an “extra A” sneaks into the 
cube of this congruence. 

Lemma 8.39. If a is an Eisenstein integer for which X \ a, then 

a 3 = ±1 mod A 4 . 

Proof Let’s first consider the case a = 1 mod A; say 

a = 1 + AyS 

for some ft e Z [co]. Substitute this into the usual factorization in Z[co\: 
a 3 — 1 = {a — l)(a — co){a — co 2 ). 

Rewrite the first factor on the right-hand side: a— 1 = I + A /I — I = A/L 
Next, since 1 — co = A, we can rewrite the second factor: 

oc — co = l + Xf — co = X + Xf = A(1 + P). 

Now rewrite the third factor, using Exercise 8.21 on page 349, which says that 
l — co 2 = —co 2 X: 


a — co 2 = \ + Xp — co 2 = Xp — co 2 X = A (/) — co 2 ). 
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Therefore, 

cc 3 — 1 = A 3 jS(1 + £)0 8 -co 2 ). (8.11) 

Example 8.32 shows that ft is 0, —1, or 1 mod A. In each of these cases, 
we’ll see that there’s an extra factor of X in the expression on the right-hand 
side of Eq. (8.11). If yS = 0 mod A for some f e Z[co\, then ft = Xfi', and the 
expression begins X 4 f . If f = —1 + A j3', then the middle factor 1 + ft equals 
A/T, which contributes an extra A. If ft = 1 + A/T, then Exercise 8.21 says that 
1 — ft) 2 = — ft) 2 A, and the last factor on the right-hand side becomes 

ft — oj 2 = 1 + Xft — oj 2 
= 1 -co 2 +Xp’ 

= —co 2 X + X p' 

= X (-m 2 + p). 

Therefore, if a = 1 mod A, then a 3 — 1 is a multiple of A 4 ; that is, a 3 = 
1 mod A 4 . 

The remaining case a = — 1 mod A is now easy. We have —a = 1 mod A, 
so that 

(— a) 3 = 1 mod A 4 , 


and so a 3 = —1 mod A 4 . ■ 


Gauss used infinite descent on v(z) and showed (as we will shortly) that if 
there was a solution to x 3 + y 3 = uz 3 of the desired type, then one could find 
another solution (x', y', z') of the same type with v(z') < v(z). Iterating this 
process will eventually contradict the next lemma. 

Lemma 8.40. Suppose x 3 + y 3 = uz 3 for nonzero Eisenstein integers x, y, 
z. IfX \ xy ami X \ z, then A 2 | z. 

Proof. Since A \ xy, Euclid’s Lemma in Z [co] says that A \ x and A \ y, and 
so Lemma 8.39 applies to say that both x 3 and y 3 are congruent to ± 1 mod A 4 . 

Hence, reducing x 3 + y 3 = uz 3 mod A 4 yields 

(±1) + (±1) = uz 3 mod A 4 . 

Note that A | z implies that 
v(z) > 1. 


The left-hand side of these congruences is one of 0, 2, or —2. Since A | z and 
X \ 2 (why?), we see that ±2 are impossible. Thus, 0 = uz 3 mod A 4 , so that 
A 4 | z 3 and v(z 3 ) = 3v(z) > 4. But v(z) is an integer; hence, v(z) > 2 and 
A 2 | z. ■ 


Here’s the main piece of the puzzle: the key step for infinite descent. 

Proposition 8.41. Suppose that u is a unit in Z[a>] and x 3 + y 3 = uz 3 for 
Eisenstein integers x, y, z with X \ xy and X \ z. Then there exists a unit if 
and x 1 , y', z' G Z[a>\ with A \ x'y ' and v(z') = v(z) — 1, such that 

(xf + (yf = u' (z')\ 


Before we dig into the 
proof, think about why 
this result, combined with 
Lemma 8.40, implies 
that there is no solution 
to x 3 + v 3 = mz 3 in 
Eisenstein integers with 
A \ xy and A | z. 
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Recall the role of the factor 
x + y in the proof of the 
first case. 


Proof. Given x,y,z as in the statement, we factor x 3 + y 3 to get 

(x + y){x + yco){x + you 2 ) = uz 3 . (8.12) 

Lemma 8.40 implies that A 2 | z so that v ( uz 3 ) > 6. Hence at least one 
factor on the left-hand side of the above equation is divisible by A 2 . Because 
x, y, and z are Eisenstein integers, we can replace y by ya> or ya> 2 without 
changing the equation or the claim of the proposition. Hence we can assume, 
without loss of generality, that A 2 | x + y; that is, v(x + y) > 2. 

Now, 


x + yco = (x + y) - (1 - co)y = x + y - Xy 

and, since A \ y, we have v(Ay) = 1. Hence, v(x + yco) = 1, by Proposi- 
tion 8.36(iii). Similarly, v(x + yco 2 ) = 1. So, applying v to Eq. (8.12) and 
using Proposition 8.36, we have: 

3v(z) = v (x + y) + v (x + yco) + v (x + yco 2 ) 

= v(x + y)+ 1 + 1, 


so that 

v (x + y) = 3 v(z) — 2. 

For convenience, let’s call the right-hand side k: 

k = 3v(z) — 2. (8.13) 

The factors on the left-hand side of Eq. (8.12) are each divisible by A. We 
claim that they can’t have any other common factors. To see this, suppose that 
y is a prime in Z[co], y A. If y divided x + y and x + yco. then it would 
divide their difference, which is A y. By Euclid’s Lemma, y \ y, but then y | x, 
contradicting the fact that gcd(x, y) = 1. Hence 

gcd(x + y. x + yco) = A. 


The same reasoning shows that the gcd of each of the other pairs of factors 
is A. 

Putting all this together, we have the following equation in Z[co\: 


x + y ^ f x + yco \ ( x + yco 2 

~nr) \ 



(8.14) 


where the three factors on the left-hand side are relatively prime [remember 
Eq. (8.13): 3u(z) = k + 2], 

Now invoke unique factorization in Z[co\: the right-hand side of Eq. (8.14) 
is a cube, and the left-hand side is a product of three relatively prime factors 
(each having no factor A). Hence, each is a cube and, more precisely, there are 
units u i , u 2, u 3 and Eisenstein integers z \ , Z2, Z3 with 


x + y 

x + yco 
A 

x + yco 2 
A 


Mi z 3 and A \ z\, 
U 2 z\ and A \ zi, 
u 3 Z3 and A \ Z3. 
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Clearing fractions, it follows that 

x + y = u\X k z\ where A \ z\, 
x + yco = M 2 Az| where A \ zi, 
x + yco 2 = ujXzl where A \ Z 3 . 

Multiply the second of these equations by ft), the third by ft) 2 , and add them to 
the first to obtain 

0 = u\ X k z\ + U 2 C 0 A z\ + U 3 C 0 2 Azj. 

Divide both sides by A: 

0 = Mi \ k ~ l Zj + U 2 COZ 2 + U 3 CO z\. 

We’re almost there. Letting iq = —Mi, V 2 = U 200 , and V 3 = U 3 C 0 2 , we have 
iqzf + U3Z3 = iqA k ^ l z\. 

Recalling that k = v(z) — 2, this can be written as 

f 2 z| + U 3 Z 3 = iq ^A v ^ - 1 zi^ , where A \ Z 1 Z 2 Z 3 . (8.15) 

Divide both sides by V 2 , relabel everything, and we have the equation 
(xf + v (yf = v' (z'f , 

where A \ x'y ’ , v and v ' are units, and v(z') = v(z ) — 1. Now A 2 divides 
the right-hand side of the equation, by Lemma 8.40, so reducing the equation 
mod A 2 yields 

(±1) + (±t>) = 0 mod A 2 . 

Once again, trying all six Eisenstein units and all possible signs, you can check 
that v = ± 1. Replacing y' by —y' if necessary, we have 

(* 0 3 + (/)W(z') 3 , 

where A \ x'y' and v(z') = v(z) — 1, and this is what we wanted to show. ■ 

Proposition 8.42. There are no Eisenstein integers m, x, y , z with u a unit , 
A \ xy, and A | z, such that 

■2 -7. -T. 

x + y = uz . 

Proof. Suppose such elements u,x,y,z exist. Repeated use of Proposition 8.41 
shows that there are elements u' , x ' , y' , z' with v(z') < 2. But Lemma 8.40 
says that this is impossible. 

It remains to settle the case where A | yz. If you’ve held on this long, 
there’s a relatively simple finish: Given Eisenstein integers u,x,y,z with u a 
unit, A | x, and A \ yz and 

■2 

x + y = uz ; 

reduce mod A 2 to obtain ±1 = u mod A 2 . A check shows that u = ±1. But 
then 

(±z ) 3 + (-y ) 3 = x 3 , 
and we can apply Proposition 8 .4 1 . ■ 

This establishes Gauss’s Theorem. 


Note that iq, iq, and 03 
are all units. 


Once again, we use infinite 
descent. 


See Exercise 8.25 on 
page 349. 
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Theorem 8.43 (Gauss). There are no Eisenstein integers u,x,y,z with u 
a unit, xyz ^ 0, and 3 a factor of exactly one of x, y, and z, such that 

x + y = uz . 

Proof Since X | 3, the hypothesis in Proposition 8.42 that A is a factor of 
exactly one of x, y, and z implies that 3 is a factor of exactly one of x, y, and 

z . m 


After all this work, we have, as a simple corollary, what we wanted in the 
first place. 

Theorem 8.44 (Fermat’s Last Theorem for Exponent 3). There are no pos- 
itive integers x, y, z such that x 3 + y 3 = z 3 . 

Proof. Since X | 3, Proposition 8.38 and Theorem 8.43 (with u = 1) cover all 
the possible cases for x, y, z. ■ 

Proving Fermat’s Last Theorem for a given exponent n was split into two 
cases, as we have just seen for n = 3; the second case was also divided into 
two parts. The first case for all n < 100 was proved, around 1806, by Germain. 
In 1825, Legendre proved one part of the second case for n = 5, while Dirich- 
let proved the other part. In 1839, Lame proved Fermat’s Last Theorem for 
exponent n = 7. The level of difficulty increased with the exponent. It was not 
until Kummer that many exponents were completely settled simultaneously. 


Exercises 

8.29 Show that none of the six units u in Z[w] is congruent mod X to 0, 2, or —2. (As 
usual, X = 1 — co.) 

8.30 Prove Proposition 8.36. 

8.31 Without using Proposition 8.38, show that there are no integers x, y, z with 
3 \ xyz such that x 3 + y 3 = z 3 mod 9. This exercise gives an alternative proof 
of Proposition 8.38. 

8.32 Show that there are no integers x, y , z with 5 \ xyz such that x 5 + y 5 = 
z 5 mod 25. This exercise implies Fermat’s Last Theorem for exponent 5 in the 
case that 5 \ xyz. 

8.33 Are there any integers x, y, z with 7 \ xyz such that x 7 + y 7 = z 7 mod 49? 

8.34 (i) Sketch the graph of x 3 + y 3 = 1. 

(ii) Show that the only rational points on the graph are (1,0) and (0. 1). 

8.35 Take It Further. Let G be the graph of x 3 + y 3 = 9. 

(i) Sketch G. 

(ii) Find the equation of the line l tangent to G at (2, 1). 

(iii) Find the intersection of l and G. 

(iv) Show that there are infinitely many triples of integers (x, y, z) such that 

x 3 + y 3 = 9z 3 . 
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8.4 Approaches to the General Case 


Almost all attempts to prove there are no positive integers x,y,z satisfying 
x p yP = z p , where p is an odd prime, divided the problem in half. The first 
case assumes that gcd(x, y) = 1 and p \ xyz; the second case assumes that 
exactly one of x, y, and z is divisible by p. 

Our choice of proof for exponent 3 contains some of the main ingredients 
of a proof of the first case for any odd prime p , provided that the ring Z[f p \, 
where f p = cos(2 tz f p) + i sin(27r/ p), is a UFD. Once again, this is based on 
the factorization in Exercise 3.50 on page 115: 

x p + y p = (x + y)(x + f p y) ...(x + tf- l y) (8.16) 

The basic idea is to use the fact, in a UFD, that if a product of relatively prime 
elements is a p\h power, then each of its factors is also a pth power. The proof 
is more complicated for large p because, while Z[£b] = Z[w\ has only six 
units, the ring Z [t, p \ for p > 3 may have infinitely many units. As we saw 
in the proof of Proposition 8.38, much of the argument depends on a careful 
analysis of how units enter into the calculations. 

The commutative rings Z[f p ] are called rings of cyclotomic integers , and 
investigating them has played an important part of the story of Fermat’s Fast 
Theorem, well into the 20th century. We’ll start this section with a brief sketch 
of arithmetic in Z[£p], pointing to some major results, perhaps without proof, 
that generalize results we’ve already established for Z[w\. 

After that, we’ll sketch the work of Kummer that deals with the situation 
when unique factorization fails. While these efforts didn’t lead him to a proof 
of Fermat’s Fast Theorem, they did lead to some ideas that have had real stay- 
ing power in algebra. One of them is his introduction of ideals as an important 
structural component of a commutative ring (Kummer called them divisors), 
not merely as subsets that happen to arise, say in studying gcd’s. Another im- 
portant idea is that of class number, a measure of how far off Z[f p ] is from 
having unique factorization. 

Here is a biography of Kummer we have adapted from that given in the 
history archives of the School of Mathematics and Statistics of the University 
of St. Andrews in Scotland. 

Ernst Eduard Kummer was born in Sorau, Prussia, in 1810. He entered the 
University of Halle in 1828 with the intention of studying Protestant theology, 
but he received mathematics teaching as part of his degree which was designed 
to provide a proper foundation to the study of philosophy. Kummer’s mathe- 
matics lecturer H. F. Scherk inspired his interest in mathematics, and Kummer 
soon was studying mathematics as his main subject. 

In 1831 Kummer was awarded a prize for a mathematical essay he wrote on 
a topic set by Scherk. In the same year he was awarded his certificate enabling 
him to teach in schools and, on the strength of his prize-winning essay, he 
was awarded a doctorate. In 1832, Kummer was appointed to a teaching post 
at the Gymnasium in Fiegnitz, now Fegnica in Poland. He held this post for 
ten years, where he taught mathematics and physics. Some of his pupils had 
great ability and, conversely, they were extremely fortunate to find a school 
teacher of Kummer’s quality and ability to inspire. His two most famous pupils 
were Kronecker and Joachimsthal and, under Kummer’s guidance, they began 
mathematical research while at school, as did Kummer himself. He published 


Joachimsthal was famed 
for the high quality of his 
lectures. His colleagues 
in Berlin included many 
famous mathematicians 
such as Eisenstein, Dirich 
let, Jacobi, Steiner, and 
Borchardt. 
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a paper on hypergeometric series in Crelle’s Journal in 1836, which he sent to 
Jacobi, and this led to Jacobi, and later Dirichlet, corresponding with Kummer. 
In 1839, although still a school teacher, Kummer was elected to the Berlin 
Academy on Dirichlet’s recommendation. Jacobi now realized that he had to 
find Kummer a university professorship. 

In 1842, with strong support from Jacobi and Dirichlet, Kummer was ap- 
pointed a full professor at the University of Breslau, now Wroclaw in Poland, 
where he began research in number theory. In 1855, Dirichlet left Berlin to 
succeed Gauss at Gottingen, and he recommended that Berlin offer the va- 
cant chair to Kummer, which they did. The clarity and vividness of Kummer’s 
presentations brought him great numbers of students — as many as 250 were 
counted at his lectures. Kummer’s popularity as a professor was based not 
only on the clarity of his lectures but on his charm and sense of humor as well. 
Moreover, he was concerned for the well-being of his students and willingly 
aided them when material difficulties arose. 

During Kummer’s first period of mathematics, he worked on function the- 
ory. He extended Gauss’s work on hypergeometric series, giving developments 
that are useful in the theory of differential equations. He was the first to com- 
pute the monodromy groups of these series. In 1843 Kummer, realizing that at- 
tempts to prove Fermat’s Last Theorem broke down because the unique factor- 
ization of integers did not extend to other rings of complex numbers, attempted 
to restore the uniqueness of factorization by introducing “ideal” numbers. Not 
only has his work been most fundamental in work relating to Fermat’s Last 
Theorem, since all later work was based on it for many years, but the con- 
cept of an ideal allowed ring theory, and much of abstract algebra, to develop. 
The Paris Academy of Sciences awarded Kummer the Grand Prize in 1857 for 
this work. Soon after, he was elected to membership of the Paris Academy of 
Sciences and then, in 1863, he was elected a Fellow of the Royal Society of 
London. Kummer received numerous other honors in his long career; he died 
in 1893. 


Cyclotomic integers 

We shall assume throughout this section that p is an odd prime and t, = t, p = 
cos(2 7r/ p ) + i sin(2 jr/ p). Recall some facts about Q(£). 

(1) irr(£, Q) = 'Pp(x) = 1 + x + x 2 + ■■■ + x p ~ 2 + x p ~ l (Theorem 6.68 
and Exercise 7.31 on page 300). 

(2) [Q(f p ) : Q] = p — 1 (Exercise 7.32 on page 300). 

(3) x p — 1 = (x — l)(x — £)(x — £ 2 ) . . . (x — £ p ~ l ) (Exercise 6.46(i) on 
page 269). 

(4) Q(f) 9S QM/($ p (x)) (Theorem 7.25(i)). 

We recall Proposition 7.20(v), which we now state as a lemma for your 
convenience. 

Lemma 8.45. Let p be an odd prime and £ = (, p be a pth root of unity. A 
basis for <Q>(£) as a vector space over Q is 


b = i ,c,c 2 ,...,^- 2 . 




8.4 Approaches to the General Case 361 


The ring Z[£] C Q(£) is thus the set of all linear combinations J2f=o a i? 
with at G Z. It shares many of the algebraic properties of the Gaussian and 
Eisenstein integers except, alas, it is not always a UFD (more about this in 
the next section). But there are analogs for the laws of decomposition that we 
developed in Z [/] and in Z[a>\. Recall, for example, that there is equality of 
ideals in Z [/]: 

(2) = (1 -if, 


and also in Z[<u], 


(3) = (1 -co) 2 . 


It turns out that the ideal (p) ramifies in Z[£] in a similar way. Let’s look into 
this. 


Lemma 8.46. If p is an odd prime and t, = £ p is a pth root of unity, then 

p - 1 

p = Y\v-?). 

k= 1 


Proof. Since x p — 1 = (x — l)(x — £)(x — t, 2 ) . . . (x — t, p x ), we have 

y P — 1 

k = 1 

But (x p — l)/(x — 1) = Op(x) = 1 + x 4 + x p_1 , so that 

p - 1 


1 + x + • • • + x^ -1 = ]“[ (x - i; k ). 


k= 1 


Now put x = 1. ■ 

Lemma 8.46 gives a factorization of in Z[£] into factors. Our next goal 
is to show that the factors are all associates. 


Proposition 8.47. Ifs, t G N and p \ st, then 1 — £•* and 1 — f are associates 
in Z[£]. 

Proof. In the field ¥ p , let t = r~ l s, so that tr = s mod p. Then = £ s 
(why?), and so 

1 - ^ _ 1 - f 
l-tf ~ l-f 

i - (gT 
i-c* 

= i+^+(n 2 +-”+rr 1 - 

Hence, (1 — £ r )/(l — £■*) G Z[£], and 

(i - ^ i (l - n 

in Z[£]. A similar argument shows that (1 — £■*) | (1 — If ) in Z[£], It follows 
from Proposition 6.6 that 1 — £•* and 1 — are associates in Z [£]. ■ 
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As an immediate consequence, we can generalize the fact that there is equal- 
ity of ideals: (3) = (1 — w) 2 . 

The next result says that p ramifies in Z[£]. 

Corollary 8.48. In Z[£], there is a unit u such that 

p = u{\-t;y-\ 

which gives a factorization of ideals 

( P ) = (i - sy - 1 . 

Proof. Lemma 8.46 shows, as elements of Z[£], that 
p— i 

p= n( i -^)=( i -t)( i -t 2 )...( i -t p " 1 ). (8.17) 

k= 1 

Proposition 8.47 shows that there is a unit u k (2 < k < p — 1) so that 

i-y = Uk (i -?). 

Factoring out the units from Eq. (8.17) and writing their product as u, we see 
that 

P = u(i-^y~ 1 . 

Hence, we have equality of ideals in Z[£]: 

(P) = ( l-t) P_1 - ■ 

Corollary 8.49. Ifs, t € N and p \ st, then 

W r 

is a unit in Z[£], 

Proof Since 1 — and 1 — £ r are associates, there is a unit u in Z[£] with 

l-^ = w(l-£ f )> so(l-H/(l-^) = M- ■ 


How to Think About It. Since t, p = 1, every integer power of £ occurs 
among 


1 , £, £ 2 , ..., ^ _1 . 

In particular, if 1 < J < p — 1, then ^~ s = £ p ~ s . We can calculate in Z[£] by 
calculating in 

Z[x]/ (x p_1 + x p ~ 2 + • • • + 1). 

This allows us to use a CAS to do calculations and then to translate to Z[£] via 
the map f(x) i-* /(£). 
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There are other units in Z\f\ that are real numbers; Corollary 8.49 gives a 
way to produce them. 

Proposition 8.50. The real number 

£ + £ _1 = 2 cos (-y'j 

is a unit in Z\f], 

Proof. Exercise 6.46 on page 269 shows that, in Z[£][x], 

x p + 1 = (x + l)(x + £)(x 4- £ 2 ) — (x + ^ p 1 ). 

Put X = to find that 

(r 1 ) p + 1 = (r 1 + 1 xr 1 + m~ l + ■ ■ • ( r 1 + ^ _1 ). (s.is) 

The second factor on the right-hand side of Eq. (8.18) is the focus of the 
proposition. The left-hand side is 2 (because = 1). Finally, the last 

factor on the right-hand side is 

r 1 + = | = xr x . 

Hence 

2 = 2 1 ?- 1 (r 1 + i)(r 1 + *xr 1 + ^ 2 ) • • • cr 1 + r -2 ). 

Dividing both sides by 2, we see that £ _1 + £ is a unit; in fact, its inverse is: 

i = (r 1 + o [r -1 cr 1 + ixr 1 + £ 2 ) • • • cr 1 + r * 2 >] . 

(The last equation gives us other units besides £ + .) ■ 


Further results. This is just the beginning. 

• Corollary 8.48 is a piece of a law of decomposition in Z[£]. lnZ[a>], rational 
primes either stay prime, split, or ramify (and 3 is the only ramified prime). 
In Z[C, P ] for p > 3, rational primes can decompose in other ways, but it’s 
still true that the way a prime decomposes depends only on its congruence 
class mod p. Indeed, if q is a prime and / is the smallest integer such that 
qf = 1 mod p, then q spits into / prime factors in Z\f). This lovely theory 
is detailed in [5] Chapter 3 and [36] Chapter 2. 

• Corollary 8.49 and Proposition 8.50 show how to build units in Z[£], This is 
a piece of a complete classification of units in cyclotomic integers: Kummer 
proved that every unit in Z\f\ is a product tf e for some integer s, where 
e € R = Z\f + £ -1 ] (for a proof, see [36], p.3). Since 

£ + = 2 cos(27r / p) e R, 

R is a subring of R, and every unit in Z\f \ is the product of a power of £ and 
a real unit of Z[£], For p = 3, £ = co = ^ f— 1 + / V3^ and 

t + r 1 = -i. 

Hence R = Z, and every unit is a power of co times a unit in Z, namely ±1; 
this recovers the result from Exercise 4.45 on page 165. 


It follows that if 
q = 1 mod p, then q “splits 
completely” into p — 1 
factors in Z[f], What does 
this say in Z[a>] (when 
P = 3)? 
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The results in this section set the stage for a proof of Fermat’s Last Theorem, 
along the same lines as our proof of the theorem for exponent 3, for arbitrary 
prime exponents p, as long as Z[£ p \ has unique factorization. Kummer did 
exactly this, for both cases of the theorem (a detailed historical account is 
in [23]). As in the case p = 3, the key players are Eq. (8.16), the prime X = 
1 — f, and the units % s e where e is a real unit in Z[£]. We leave the story here, 
pointing to [5] Chapter 3 for the rest of the technical details. 


Exercises 

8.36 As usual, let £ = cos(2 jr/ p ) + i sin(2;r/ p), where p is a rational prime. 

(i) Show that [Q(£) : Q (f + £ -1 )] = 2, and find 

irr&Q^ + r 1 )- 

(11) What is [Q (C + r 1 ) :Q]? 

8.37 (i) Experiment with various values of p and calculate 

n o + &) • 

i=i 

(ii) Find a general formula (for any prime p) for 

n‘o+4)- 

i = 1 

8.38 For l < s < p — l (p a prime), show that 

t s + t~ s 

is a unit in Z\£ p \. Is t, s p + £ ~ s a real number? 

8.39 In Z[x]/ ($ 5 (x)), calculate 

(t) x 4 (x 4 + l)(x 4 + x 2 )(x 4 + x 3 ) 

(ii) (x + x 4 ) (x 3 + x 2 ) 

(Hi) (x + x 4 ) (1 + x). 

8.40 In Z[^ 5 ], calculate 

(i) f 4 (f 4 + 1 ) (? 4 + ?) (? 4 + ? 3 ) 

(ii) (? + ? 4 ) (? 3 + ? 2 ) 

(ill) (? + ? 4 ) (1 + ?). 

8.41 Write 1 + £5 as the product of a power of £5 and an element of Z(£s + fr 1 ). 


.-1 1 ~?5 _ sin(37r/5) 
5 1 -Cs sin(jr/5) 


8.42 Show that 
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Kummer, Ideal Numbers, and Dedekind 

It is natural to think that the rings Z[£ p \ are UFDs, as evidenced by the num- 
ber of mathematicians in the 17th, 18th, and 19th centuries who assumed it. 
Indeed, it’s true for all primes less than 23, but Z[^ 23 ] does not enjoy unique 
factorization ([23], p.7). How could so many not know this? It may seem that 
23 is not that large, but the calculations in rings of cyclotomic integers are 
hefty, even with computers. Imagine the stamina required to calculate by hand 
with polynomials of degree 22 in £ 23 - Some of Kummer’s tour-de-force cal- 
culations are recounted in [11] Chapter 4. The proof that unique factorization 
fails in Z[^ 23 ] is technical (again, see [11], Chapter 4, but the essential idea can 
be illustrated in the ring R = Z[ V— 5] This is a perfectly good commutative 
ring (of course, R is not a ring of cyclotomic integers): 

R = Z[V^5] s Z[jc]/ ( x 2 + 5) . 

If we let a = V— 5, then elements of R can be written as a + ba with a, b e Z. 
If z = a + ba , then its complex conjugate is, as usual, z = a — ba. Just as 
in Gaussian and Eisenstein integers, we can take norms: N(z) = zz, and we 
have 

N (a + ba) = ( a + ba)(a — ba) = a 2 + 5b 2 . 

The usual properties of norm hold in R: it is multiplicative, the norm of a unit 
is 1, and conjugates have the same norm (Exercise 8.43 below). 

There are two factorizations of 6 in R : 

6 = 2 • 3 = (1 + n^5)(I - V^5). 


We claim that they are essentially different ways to factor 6 into primes. Let’s 
see why. 

Lemma 8.51. (i) The rational integers 2 and 3 are prime in R = Z[V— 5], 

(ii) 1 + a and 1 — a are prime in R. 

Proof, (i) If 2 = zw for non-units z and w, then 

4 = N{zw) = N(z) N(w). 

By the Fundamental Theorem in Z (and the fact that neither z nor w is a 
unit, so that neither has norm 1), N(z) would be a proper factor of 4, that 
is N(z) = 2. But 2 can’t be written as a 2 + 5b 2 . The proof for 3 uses 
exactly the same idea. 

(ii) If 1 + a = zw for non-units z and w, then 

6 = N(l+a) = N(z)N(w). 

By the Fundamental Theorem in Z (and the fact that neither z nor w is a 
unit), 1 N(z) would be a proper factor of 6, say N(z) = 2. But 2 can’t be 
written as a 2 + 5b 2 . The proof for 1 — a uses exactly the same idea. ■ 

So, we have two factorizations of 6 into primes in R. We’ve seen in other 
rings that different-looking factorizations are really the same up to unit factors. 
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But that doesn’t happen here because neither 2 nor 3 is associate to 1 + a, for 
neither has norm 6 = /V(l + a). We have a problem! 

Kummer was working on methods for factoring cyclotomic integers (not, 
as it turns out, towards a proof of Fermat’s Last Theorem, but towards another, 
related question). He devised a way to think about our problem that actually 
shows up in elementary school when children think that 14x15 and 10x21 
are different factorizations of 210. The students are not going far enough in 
their factorizations: if they write 

14 = 2x7 and 15 = 3 x 5, 

they see that the “other” factorization is just a rearrangement of the prime 
factors of 14 x 15: 


Actually, Kummer consid- 
ered rings of cyclotomic 
integers. We’re using R 
here just for the sake of 
example. 

Exercises 5.51 and 5.52 
on page 220 define the 
product of two ideals and 
develop the properties of 
the multiplication. 


10 = 2 x 5 and 21 = 3 x 7. 

Now, our problem is different in the sense that we already have prime factor- 
izations. But Kummer’s idea was to imagine some “ghost factors” for each of 
2, 3, 1 + a, and 1 — a, sort of “super primes” behind the scenes, that could be 
rearranged to produce the different factorizations. Kummer called these ideal 
numbers or divisors, and he imagined there was a further factorization into 
ideal numbers J\, J 2 , J 3 , /+ 

2 = J l J 2 

3 = J 3 J 4 
1 + ct = J\ J 3 
1 — a = J 2 Ja- 

Kummer knew that no such ,/, existed in R, but he was able to model these 
ghost factors, not as elements of R but as “lists” of elements, each list contain- 
ing the non-associate divisors of 2, 3, 1 + a, and 1 — a. And he developed a 
theory extending R to a new system R' in which there was unique factorization 
into ideal numbers. Later, Dedekind refined Kummer’s ideas, recasting ideal 
numbers into what we nowadays call ideals, a notion, as we’ve seen in this 
book, that has utility far beyond investigations into Fermat’s Last Theorem. 
We’ll use the contemporary notion of ideal to continue our story. 

The basic idea is that products of elements are replaced by products of ide- 
als. In a PID, nothing new is added, because there’s a bijection between ring 
elements (up to associates) and principal ideals (Exercise 5.5 1 (ii) on page 220). 
But rings that are not UFDs are not PIDs (Theorem 6.50), so there’s a larger 
stash of ideals that can enter into factorizations. 

Example 8.52. We’ve seen, in R = Z [a], where a = V— 5, that 
6 = 2-3 = (l+aO(l-a!). 

The ghost factors that will resolve our problem are ideals in R generated by 
two elements: 

J\ = (2,1 + a:) = {2a + b{ 1 + a) : a, b e R} 

J 2 = (2,1 — a) = {2a + b( 1 — a) : a,b e R} 

J 3 = (3,1 + a) = {3a + b{ 1 + a) : a, b e R} 

J 4 = (3, 1 — a) = {3a + b{ 1 — a ) : a, b e R}. 
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We claim that 

(2) = JiJ 2 

(3) = J 3 J 4 
(1 + a) = / 1/3 
(1 — a) = J2J4 

The verifications all use the same method, so we’ll carry it out for the first case 
only, leaving the rest for you as Exercise 8.45 below. 

Let’s show that (2) = J \J 2 - Now the product of two ideals / and J is 
the set of all linear combinations of products rs where re/ and s e J 
(Exercise 5.51 on page 220). So, J\J 2 = (2, 1 + a)(2, 1 — a) is the set of all 
linear combinations of the form (recall that ( 1 — a)(l + a)= 6 ): 

a( 2 ■ 2) + 6 ( 2(1 — a)) +c ( 2(1 + a)) + d( 1 + a)(l — a) 

= 4 a + 26(1 — a) + 2c(l + a) + 6 d, 

where a. 6 , c . d e R. Well, 

4 a + 26(1 — a) + 2 c(l + a) + 6d = 2 [ 2 a + 6(1 — a) + 2 c(l + a) + 3 d ] , 
so J\J 2 C (2). And, if 

(a, 6 , c, d) = (— 1 , 0 , 0 , 1 ), 

we have 

4a + 26(1 — a) + 2c(l + a) + 6d = 2, 
so that (2) C J\J 2 - Hence 

(2) = JxJ 2 

as claimed. The other verifications follow in the same way. 

Ah, but there’s one glitch. What if one of the four ideals is (1), the unit 
ideal? If J\ = (1) for example, we’d have (2) = (1 — a), and we’d still have 
the same problem. But we can show that none of the ,/, is the unit ideal. Let’s 
show that J\ 7 ^ (1) — the arguments for the others are the same (Exercise 8.46 
below). 

Suppose, on the contrary, that J\ = (2.1 + a) = (1). Then there exist 
elements r + sa and t + ua in R , where r, s, f, u e Z, so that 

1 = (r + sa) - 2+ (t + ua)( 1 + a). 

Multiply this out, using the fact that a 2 = —5, and write the result as x + ya 
to obtain 

1 = (2r + t — 5m) + (2 s + t + u)a. 

It follows that 

2 r + t — 5m = 1 
2s + f + m = 0. 

Replace u by —2s — t in the first equation to obtain 

2 r — 4 1 + 10m = 1. 

Since the left-hand side is even, this is impossible. ▲ 

Kummer introduced another brilliant idea. Call two ideals / and J equiva 
lent if there is a cyclotomic integer z so that 

/ = (z)J = {zb : 6 € /}. 


For rings of cyclotomic 
integers Z[t; p \, it turns 
out that this new kind of 
factorization into ideals is 
unique. 


Note that these equations 
are equalities of ideals, not 
numbers. 
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If a,b e R, where R 
is a commutative ring, 
then a | b if and only 
it D(a) 2 D{b)\ thus, 
if R is a domain, then 
D{a) = D{b) if and only if 
a and b are associates. 


He was able to show that this gives an equivalence relation on nonzero ideals in 
Z [£p] (for symmetry, the set of all ideals must be enlarged by adding in certain 
subsets of Frac(Z [£ p ]) = Q(£ p ) called fractional ideals ). Most importantly. 
Rummer showed that this relation has only finitely many equivalence classes, 
and he called the number h(p) of them the class number of Z[£ p \. If Z[£ p \ 
has class number 1, then all ideals are principal, there is unique factorization, 
and our proof of Fermat’s Last Theorem can be refined to produce a proof 
for such exponents. In fact, Rummer generalized this, proving that if the class 
number h(p) is not divisible by p, then there are no positive integer solutions to 
a p + b p = c p . This was a monumental achievement. Rummer called primes p 
such that p \ h(p ) regular primes. For example, even though Z[( 2 3 ] doesn’t 
have unique factorization, 23 is a regular prime — /z (23) = 3, and so Fermat’s 
Last Theorem holds for it. Alas, there are irregular primes. The smallest is 
37, and the next two are 59 and 67. Unfortunately, it is known that there are 
infinitely many irregular primes, and it’s unknown whether there are infinitely 
many regular primes. 

Let’s now say a bit more about Rummer’s ideal numbers (nowadays called 
divisors ), but we view his idea through the eyes of Dedekind. Take a cyclo- 
tomic integer a e Z[£], and define its divisor 

D(a) = {z e Z\f] : a is a divisor of z}. 

Now D(a) is closed under addition and multiplication by other cyclotomic 
integers; that is, if z, z' e D(ci), then z + z' e D(a)~, if z e D(a ) and r e Z[£], 
then rz e I) (a). In other words, D(a) is an ideal (in fact, a principal ideal) 
in precisely the sense we have been using the term in this book (and we see 
how natural the idea is when viewed in this context). The definition of divisor 
makes sense for any commutative ring R , not just for the rings Z[£], 

Now generalize the notion of divisor so that, instead of being a subset of 
a commutative ring R of the form D(a) for some a € R. it is a subset of R 
closed under addition and multiplication by elements of R; that is, let’s replace 
D(a ), which is a principal ideal, by any ideal. Thus, if a, b e R, then 

D(a) + D(b) = {z + w : z e D(a) and w e D(b)} 

is a generalized divisor. If we denote D(a) + D(b) by D(c), pretending that 
generalized divisors are just ordinary divisors, then we cannot declare that c is 
an element of R. Thus, c is a “ghost” element. Of course, if R is a PID, then 
c is an element of R. but if R is not a PID, then c may be a creature of our 
imagination. 

Consider the ring R = Z[a] in Example 8.52, where a = V— 5. The fac- 
torizations of 6, 

6 = 2-3 and 6 = (1 + a)(l — a), 

involve four elements of R, each of which gives a divisor. As in the example, 
define 

J\ = D{ 2) + D(a) 

J 2 = D( 2) + D(1 - a) 

J 3 = D{ 3) + D{ 1 + a) 

J 4 = D( 3) + D(1 - a). 

We can pretend that there are ghosts c, so that J t = D(ci ) for i = 1,2,3, 4. 
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To complete the story, we report that ghosts are primes: the ideals /,• can 
be shown to be prime ideals, using the notion of the norm of ideals. Moreover, 
one can prove that factorizations in terms of such ghosts are unique, using 
fractional ideals. 


How to Think About It. One of the contributions of Fermat’s Last Theorem 
to algebra is that it attracted mathematicians of the first order and, as they 
studied it, they enhanced the areas of mathematics impinging on it. For algebra 
in particular, it brought the idea of commutative rings, factorization, and unique 
factorization to the forefront. Kummer’s recognition that unique factorization 
was not always present, and his restoration of it with his “ideal numbers,” led 
Dedekind to introduce ideals into the study of rings. Dedekind’s notion of ideal 
was taken up by Hilbert and then later by Emmy Noether. It is today one of the 
most fundamental ideas in modern algebra. 

We have a confession to make. Our discussion in Chapter 6 explains par- 
allels of the arithmetic of polynomials with coefficients in a field k with the 
arithmetic of integers by saying that both k[x\ and Z are PIDs. No doubt, our 
ancestors were aware of the analogy between these two systems, but viewing 
them in terms of ideals is a modern viewpoint, after Dedekind, dating from 
the 1920s. We wrote Chapter 6 using contemporary ideas because it unifies the 
exposition. 


Richard Dedekind was born in 1831 in Braunschweig (in what is now Ger- 
many). He entered the University of Gottingen in 1850; it was a rather disap- 
pointing place to study mathematics at the time, for it had not yet become the 
vigorous research center it turned into soon afterwards. Gauss taught courses 
in mathematics, but mostly at an elementary level. Dedekind did his doctoral 
work under Gauss’s supervision, receiving his doctorate in 1852; he was to be 
the last pupil of Gauss. 

In 1854, both Riemann and Dedekind were awarded their habilitation de- 
grees within a few weeks of each other. Dedekind was then qualified as a uni- 
versity teacher, and he began teaching at Gottingen. Gauss died in 1855, and 
Dirichlet was appointed to fill the vacant chair. This was an extremely impor- 
tant event for Dedekind, who found working with Dirichlet extremely prof- 
itable. He attended courses by Dirichlet, and they soon became close friends; 
the relationship was in many ways the making of Dedekind, whose mathemat- 
ical interests took a new lease on life with their discussions. Around this time 
Dedekind studied the work of Galois, and he was the first to lecture on Galois 
theory when he taught a course on the topic at Gottingen. 

In the spring of 1858, Dedekind was appointed to the Polytechnikum in 
Zurich. It was while he was thinking how to teach differential and integral 
calculus that the idea of a Dedekind cut came to him. His idea was that every 
real number r divides the rational numbers into two subsets, namely those 
greater than r and those less than r . Dedekind’s brilliant idea was to represent 
the real numbers by such divisions of the rationals. 

The Collegium Carolinum in Brunswick had been upgraded to the Brunswick 
Polytechnikum by the 1860s, and Dedekind was appointed there in 1862. He 
returned to his home town, remaining there for the rest of his life, retiring in 
1894. Dedekind died in 1916. 
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Dedekind made a number of highly significant contributions to mathematics 
and his work would change the style of mathematics into what is familiar to us 
today. One remarkable piece of work was his redefinition of irrational numbers 
in terms of Dedekind cuts, as we mentioned above. His work in number the- 
ory, particularly in algebraic number fields, is of major importance. He edited 
Dirichlet’s lectures, and it was in their third and fourth editions [8], published 
in 1879 and 1894, that Dedekind wrote supplements in which he introduced 
the notion of an ideal. Dedekind’s work was quickly accepted, partly because 
of the clarity with which he presented his ideas. 

Dedekind’s brilliance consisted not only of the theorems and concepts that 
he studied but, because of his ability to formulate and express his ideas so 
clearly, his new style of mathematics has been a major influence on mathe- 
maticians ever since. 

The full proof of Fermat’s Last Theorem had to wait for much more pow- 
erful methods, developed in the latter half of the 20th century. More about this 
in the next chapter. 

Exercises 

8.43 Let R = Z[s/^5] and let TV :/?—>■ Z be the norm map: N(z) = z z. Show that 

(i) N(zw) = N(z) N(w) for all z,w £ R. 

(ii) m is a unit in R if and only if N(u ) = 1. 

(iii) If z e R, N(z) = N(z). 

(iv) If a e Z, N(a) = a 2 . 

8.44 Find all the units in R = Z[\/— 5]. 


PROMYS 06 



(1+ s/=5) ■ (l-yzg) — 6 

Mathematicians are annoyingly precise. 

-G. H. Stevens 


Figure 8.1 . The front of a T-shirt. 


8.45 Referring to Example 8.52, show that 


(3) = J3J4 
(1 + a) = J1J3 
(1 — o') = J2J4. 
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8.46 Referring to Example 8.52, show that none of J 2 , J 3 , J 4 is the unit ideal in R. 

8.47 Referring to Example 8.52, 

(i) The ideal generated by the norms of elements in J\ is an ideal in Z, and hence 
is principal. Find a generator for it. 

(ii) Do the same for the other ideals Jj ( i =2,3, 4). 

8.48 Take It Further. Figure 8.1 is the front of a T-shirt that illustrates that 

2 • 3 = (1 + >/— 5)(1 + 7+5). 


Explain. 

8.5 Connections: Counting Sums of Squares 

This section investigates an extension of a question we asked in Section 8.2. 
You saw, in Corollary 8.20, that an odd rational prime can be written as a 
sum of two squares if and only if it is congruent to 1 mod 4. What about 
composite integers? For example, 15 can’t be written as a 2 + b 2 , but 65 can: 
65 = 8 2 + l 2 . In fact, 65 can be written as the sum of two squares in another 
way: 65 = 4 2 + 7 2 . This leads to the following question: 

In how many ways can a positive integer be written as a sum of two 
squares ? 

The surprising answer to this question was first discovered by Fermat. Just as 
we used the arithmetic of Eisenstein integers to prove Fermat’s Last Theorem 
for exponent 3, we’ll use the arithmetic of Gaussian integers to understand 
Fermat’s discovery. 

Before continuing, let’s first consider n = 5. Now 5 is a sum of two squares: 
5 = 2 2 + l 2 . We recognize the norm of a Gaussian integer, for 5 = 2 2 + l 2 = 
N( 2 + i ). Is there another way to write 5 as a sum of two squares? Recall that 5 
splits in Z[i] as (2 + z)( 2 — /), which suggests writing 5 as N ( 2 — /); that is, 
5 = 2 2 + (— l) 2 . If we agree, when we write n = a 2 + b 2 , that both a and b 
are nonnegative, then we can ignore the second equation 5 = N ( 2 — z). Ah, 
but there’s another way to write it as N(a + bi) with both a , b nonnegative. 
While 2 — i doesn’t have nonnegative real and imaginary parts, it is associate 
to 2 + i , because 

i (2 — z ) = 1+2/; 

and 2 + i and 1 + 2 i are not associates (why?). So there are two bonafide non- 
associate Gaussian integers a + bi with nonnegative a and h and norm 5. Let’s 
agree, then, that 5 is a sum of two squares in two ways: 2 2 + l 2 and l 2 + 2 2 . 
The following definition of a counting function makes sense. 

Definition. The function r : N — > N is defined on nonnegative integers by 

r(n) = the number of non-associate Gaussian integers of norm n. 

Since we are interested in equations n = a 2 + b 2 in which a and b are 
nonnegative, it is reasonable to restrict our attention to non-associate Gaussian 
integers in the first quadrant. By Exercise 8.13 on page 343, every Gaussian 
integer is associate to exactly one Gaussian integer in the first quadrant of 


We should say “as a 
sum of two squares of 
nonnegative integers.” 
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the complex plane. We’ve been using the term “first quadrant” throughout the 
book, often without properly defining it. We now insist that the positive x-axis 
is in the first quadrant but that the positive y-axis is not. The reason is, viewing 
R 2 as C, that we want to find a piece of the complex plane that contains one 
Gaussian integer from each class of associates. 

Definition. The first quadrant of the complex plane is 

Qi = {z = a + bi e C : a > 0, b > 0}. 

The real number c is associate to ic (on the imaginary axis), and we don’t 
include i c in the first quadrant. Also, the origin 0 = 0 + 0/ does not lie in Q i . 


In light of these remarks, we modify the definition of r(n ) for the purpose at 
hand, counting only Gaussian integers in the first quadrant (as we wrote above. 
Exercise 8.13 says that two such integers are necessarily not associate). 

r(n) = \{z e Z[i] | N(z) = n and z e (9 1 Ji- 
lt's worth calculating r(n) to get a feel for what it is counting. High school 
students should enjoy working out some of these numbers by hand (there is no 
need to mention machinery of Z[i]). Here are some values for them to check. 


You can also check that 
r(15625) = 7 and, if 
you are ambitious, that 
r(815730721) = 9. 


Or, pick a few primes p, 
say 3,5,7, 11, 13 and 
see what happens as 
you calculate the values 
of r(p k ). What about 
products of two primes? 


n 

r(n) 

n 

r (n ) 

n 

r(n) 

n 

r (n ) 

n 

r(n) 

i 

1 

ii 

0 

21 

0 

31 

0 

41 

2 

2 

1 

12 

0 

22 

0 

32 

1 

42 

0 

3 

0 

13 

2 

23 

0 

33 

0 

43 

0 

4 

1 

14 

0 

24 

0 

34 

2 

44 

0 

5 

2 

15 

0 

25 

3 

35 

0 

45 

2 

6 

0 

16 

1 

26 

2 

36 

1 

46 

0 

7 

0 

17 

2 

27 

0 

37 

2 

47 

0 

8 

1 

18 

1 

28 

0 

38 

0 

48 

0 

9 

1 

19 

0 

29 

2 

39 

0 

49 

1 

10 

2 

20 

2 

30 

0 

40 

2 

50 

3 


Look for regularity in the table, make some conjectures, and try to prove them. 
For example, can you see anything that the values of n for which r (n ) = 0 have 
in common? 

It’s likely that Fermat did exactly these kinds of investigations — lots of pur- 
poseful numerical calculations — to arrive at an amazing result that we’ll prove 
in this section. 


Theorem 8.53 (Fermat). The number of ways an integer n can be written as 
a sum of two squares is the excess of the number of divisors of n of the form 
Ak + 1 over the number of divisors ofn of the form Ak + 3; that is, if 


A(n ) = the number of divisors of n of form Ak + 1 

and 

B(n) = the number of divisors of n of form Ak + 3, 


then 


r(n) = A(n) — B(n). 
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The proof that we’ll develop uses some new machinery as well as the law 
of decomposition for Gaussian integers. First, a few examples that show some 
of the delightful consequences of the theorem. 

Example 8.54. (i) Considers = 65 = 13 • 5. Its divisors are 

1, 5, 13, 65. 

There are four divisors congruent to 1 mod 4 and none congruent to 
3 mod 4, so r(65) = 4. Sure enough, 

65 = 1 + 64 
= 64+1 
= 16 + 49 
= 49+ 16. 


(ii) Let n = 21. Its divisors are 

1, 3, 7, 21. 

There are two divisors that are 1 mod 4 and two that are 3 mod 4, so 
r(21) = 0. Thus, 21 is not a sum of two squares. 

(iii) Let n = 3 m for some integer m. Odd powers of 3 are congruent to 
3 mod 4, while even powers are 1 mod 4. The divisors of 3 m are 

1 3 3 2 3 3 3 m 

See Exercise 8.49 on 
page 377. 


Corollary 8.55. For any positive integer n, we have A(n) > B(n)\ that is, n 
has at least as many divisors of the form 4k + 1 as it has divisors of the form 
4 k + 3. 

Proof. By Theorem 8.53, we have A(n) — B(n) = r(n), and r(n) >0. ■ 


It follows that 


r (3 m ) 


if in is odd 
if m is even. ▲ 


A Proof of Fermat’s Theorem on Divisors 

Let’s now prove Theorem 8.53. Our proof requires a device that finds applica- 
tions all over mathematics — a theory developed by Dirichlet that uses formal Once again, it’s time to puli 
algebra to answer combinatorial questions in arithmetic. out the pencil and paper. 


Definition. A formal Dirichlet series is an expression of the form 


a{n) 

Z_/ 

n = 1 


— r/(l) + 


a(2) 

2 s 



+ ••• , 


where the a(n) are complex numbers. (It will be useful to write a(n) instead 
of the usual a n .) 
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. . . [to] omit those parts 
of the subject, however, 
is like listening to a 
stereo broadcast of, 
say Beethoven’s Ninth 
Symphony, using only the 
left audio channel. [39]. 


Dirichlet series are not 
formal power series, and 
multiplication is not the 
same as in €[[*]]. 


Actually, the zeta function 
usually means the function 
of a complex variable .? that 
analytically continues this 
infinite series. 


The word “formal” is important here — we think of these series as book- 
keeping devices keeping track of combinatorial or numerical data (as in Ex- 
ample 2.31). So, we don’t worry about questions of convergence; we think of 
s simply as an indeterminate rather than as a variable that can be replaced by 
a real or complex number. This misses many of the wonderful analytic appli- 
cations of such series, but it turns out that their formal algebraic properties are 
all we need for this discussion. 

Dirichlet series are added and multiplied formally. Addition is done term by 
term: 


a(n) ^ b(n ) ^ a(n) + b{n) 

2—i n s 2—i n s 2—i Jt s 

n= 1 n = 1 n = 1 

Multiplication is also done term by term, but then one gathers up all terms with 
the same denominator. So, for example, if we’re looking for c( \2)/\2 s in 

^ 0(0) ^ M«) _ ^ c(«) 

2—i 2—i n s 2—i jjS 

n = 1 n = 1 n = 1 

then a denominator of \2 S could come only from the products 

o(l) b( 12) o(2) b{ 6 ) o(3) b{A) o(4) b( 3) a( 6) b{ 2) o(12) b( 1) 
~s ~2s frT’ 4 ^“’ ~s 3 2 ~12 S P~' 

In general, the coefficient c(n) in Eq. (8.5) is given by 

c(n) = y \a(d ) • b (-j-) , (8.19) 

d\n 


where „ means that the sum is over the divisors of n. 

The simplest Dirichlet series is the Riemann zeta function : 


oo 

<« = £-7- 

L / n S 


n= 1 


Eq. (8.19) implies that if 


*(*)£ 

n = 1 


a(n) 

n s 


£ 


c(n) 

n s 


Eq. (8.20) is, as we’ll see, then 
extremely useful, and it’s 

the reason for defining c(n ) = 'y^a(d). (8.20) 

Dirichlet series. ,, 


Let’s state this as a theorem. 

Theorem 8.56. If 


*(*)£ 

n = 1 


a(n) 

n s 


£ 


c(n) 

n s 


a(d). 


then c(n) = J2d\n 
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Proof. Expand 



and gather terms with the same denominator. ■ 

Sometimes, the coefficients a(n) have interesting properties. For example. 


Definition. A function a: N — > C is strongly multiplicative if, for all nonneg- 
ative integers m , n , 


a(mn ) = a(m)a(n). 

(A function a: N — > C is multiplicative if a{mn ) = a{m)a(n) whenever 
gcd(;n, n ) = 1.) 

When a is strongly multiplicative, the Dirichlet series with coefficients a(n ) 
has an alternate form that shows its connection with arithmetic. 

Theorem 8.57. If a is a strongly multiplicative function, then the Dirichlet 
series 


f w = £ 

n = 1 


a(n) 

n s 


has a product expansion 



where the product is over all prime numbers p. 


Proof. Each factor on the right side is a geometric series: 


a(p) 


= i + i'^A + ( a( 'P ) 


+ 


a(p) 


+ ... 


= 1+1 


s 


^3 s 


Multiply these together (one for each prime) and you get the sum of every 
possible expression of the form 

a(p e f )a(p e 2 2 ) . . ,a(pr r ) _ a{p\ l p e 2 2 . . . p e /) 

p\ 1 S p?... P e r Tl ~ (pV P e 2---Pr r f 

Since every n e Z can be written in one and only one way as a product of 
powers of primes (the fundamental theorem again), this is the same as the sum 


E 


a(n) 

n s 


To be rigorous, we should 
put some restrictions on 
the values of a(k) to 
ensure that the series 
converges. 
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X is called a quadratic 
character. 


Example 8.58. (i) The constant function a(n) = 1 is strongly multiplica- 

tive, so the Riemann zeta function has a product expansion 


oo 

«<> = E^ = nT^:. 

n = 1 p p J 


(ii) Here’s a multiplicative function that’s connected to our work with Gaus- 
sian integers: 


X(n) = < 


1 if n = 1 (mod 4) 

— 1 if n = 3 (mod 4) . 

0 if n is even 


You can check that / is strongly multiplicative, and so 

X(n) = 

P 


£ 

n = 1 


"(nW)- 


Now, by Theorem 8.56, if 


w \ Y^ X( n ) Y^ a ^ n ) A ( i\ 

^>£— = £ — then «(") = £*(<*)• 

n = 1 « = 1 d\n 


(8.21) 


So, a{n) is the excess of the number of divisors of n of the form Ak + \ over the 
number of divisors of n of the form 4A: + 3. Bingo: this is exactly the function 
that is the at heart of Theorem 8.53. The idea, then, is to form the Dirichlet 
series with coefficients r ( n ) and to show that 


£ 


r(n) 

n s 


too £ 

n = 1 


X(n) 

n s 


To do this, we’ll convert each of the sums to products. We already have done 
this in Example 8.58 for the sums on the right-hand side; for the left-hand side, 
we argue as follows. 

Each term in the left-hand sum is a sum of unit fractions, and the number 
of such fractions is the number of Gaussian integers with given norm. For 
example, 3 / 25 s comes from 

1 , 1 , 1 

N( 3 + 4?) + jV(4 + 3/) + N(5 + Oi) 


Using this idea and the multiplicativity of N, we get a product formula for the 
left-hand side. 

>'(n) _ ^ 1 

, — ~ „ (N(ot)Y 

oo l 

= ]| (use the fundamental theorem in Z [i ]) 

Gi k=o ((N(p)) k ) 


ps 


= J~~| j — (sum a geometric series) . 

_ ^-v 1 xTTTTvv 


peQi N(p} 
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Here, the product is over all Gaussian primes in the first quadrant. This is 
another example that is best understood by calculating a few coefficients by 
hand. 

Now use Theorem 8.21 (the law of decomposition for Z[i]). Every prime in 
Q i lies over one of these: 

• the prime 2. There’s only one in the first quadrant: 1 + i, and N(l + i) = 2. 

• a prime p congruent to 1 mod 4. There are two for each such p — if 


P = 7t 7T, 


then both i r and u have an associate in Q\ (and they are different), and each 
has norm p. 

• a prime p congruent to 3 mod 4. There’s only one such prime in Q \ , because 
such a p is inert and N(p) = p 1 2 . 

So, 

2 

v~ , , , /_ix 

r ( n ) _ rq 1 

— - 1 1 r 

n = 1 zeQ i 1 — 


n 


i 


n 


i 


p= i 


i 


1 \ 1 mod 4 n s J \ 3 mod 4 

N(z) s 2 s ' 1 / \ 1 


, , n -Vw n -VV n 

, 1 P= 1 II P= 1 II P= 1 H 

1 — — y 1 mod 4 p s J y 3 mod 4 p s J y 3 mod 4 p s 


. ■ n-^ n 


n 


1 | p odd \ — II p= 1 mod 4 1 J \ p = 3 mod 4 1 -\ — 

2 7 




n 


/>= 1 mod 4 1 


/(p) 


n 


p = 3 mod 4 1 


*00 




/(«) 


H = 1 


a{n) 

~ ^ ~n r ’ 
«=i 

where, by Eq. (8.21), 


'(») = ^Xid). 


d\n 


It follows that r(n) = a(n), and we’ve proved Theorem 8.53. 


Exercises 

8.49 Suppose that m > 1 is an integer. Show that if p is a prime and p = 3 mod 4, 

0 if m is odd 

1 if m is even. 
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If a is strongly multiplica- 
tive, is Z>? 


It might be easier to 
see things if you use 4 r 
instead of r, allowing 
Gaussian integers in all 
four quadrants. 


8.50 Suppose that m > 1 is an integer. Show that if p is a prime and p = 1 mod 4. 

r ( p m ) = m + 1. 

8.51 Suppose that m > 1 is an integer. Show that 

s{ 2 m ) = 1. 

8.52 Show that if a is a strongly multiplicative function, so is b, where b is defined by 

b(n) = E a(d). 
d\n 

8.53 A multiplicative function is a function a : N -*■ C so that a(mn ) = a(m)a(n ) 
whenever gcd(/«, n) = 1. 

(i) Give an example of a multiplicative function that is not strongly multiplica- 
tive. 

(ii) Show that if a is a multiplicative function, so is b, where 

b(n ) = 

d\n 


8.54 Show that, if gcd (m,n) = 1, then 

r(mn ) = r(m)r(n). 

8.55 Show that an integer can be written as a sum of two squares if and only if the 
primes in its prime factorization that are congruent to 3 mod 4 show up with even 
exponents. 

8.56 Take It Further. Show that 


n = 1 


<P(n) 

n s 




where </> is the Euler 0-function. 

8.57 Take It Further. Tabulations of r show some erratic behavior with no apparent 
pattern. When that happens with a function /, it’s often useful to look at the 
asymptotic behavior of its average value: 


1 

lint — 

n-+o o n 


E /(*)• 

k = 1 


Investigate the asymptotic behavior of the average value of r. 




q 

Epilog 


Attempts to resolve Fermat’s Last Theorem have led to much modern alge- 
bra. There were many other areas of mathematical research in the seventeenth, 
eighteenth and nineteenth centuries, one of which was determining the roots of 
polynomials. Informally, a polynomial is solvable by radicals if its roots can 
be given by a formula generalizing the classical quadratic, cubic, and quar- 
tic formulas. In 1824, Abel proved that there are quintic polynomials that are 
not solvable by radicals and in 1828 he found a class of polynomials, of any 
degree, that are solvable by radicals. In 1830, Galois, the young wizard who 
was killed before his 21st birthday, characterized all the polynomials which 
are solvable by radicals, greatly generalizing Abel’s theorem. Galois’ brilliant 
idea was to exploit symmetry through his invention of group theory. 

After a brief account of the lives of Abel and Galois, we will use ring theory 
to make the notion of solvability by radicals precise. This will enable us to un- 
derstand the work of Abel and Galois showing why there is no generalization 
of the classical formulas to polynomials of higher degree. We will then intro- 
duce some group theory, not only because groups were the basic new idea in 
the study of polynomials, but because they are one of the essential ingredients 
in Wiles’ proof of Fermat’s Last Theorem in 1995. In fact, symmetry is an im- 
portant fundamental idea arising throughout mathematics. In the last section, 
we will say a bit about Andrew Wiles and his proof of Fermat’s Last Theorem. 

9.1 Abel and Galois 

Niels Abel was born in Frindoe, Norway, near Stavanger, in 1802. Norway was 
then suffering extreme poverty as a consequence of economic problems arising 
from European involvement in the Napoleonic wars. Abel’s family was very 
poor, although things improved a little in 1816 when his father, a Protestant 
minister, became involved in politics (Norway, which had been part of Den- 
mark, claimed independence, and then became a largely autonomous kingdom 
in a union with Sweden). The next year, Abel was sent to a school in Christiana 
(present day Oslo), but he was an ordinary student there with poor teachers (the 
best teachers having gone to the recently opened University of Christiana). But 
two years later, a new mathematics teacher, B. Holmboe, joined the school and 
inspired Abel to study mathematics. Holmboe was convinced Abel had great 
talent, and he encouraged him to read the works of contemporary masters. In 
1820, Abel’s father died; there was no money for Abel to complete his educa- 
tion nor to enter the University. But Holmboe continued his support, helping 
him to obtain a scholarship to enter the University in 1821; Abel graduated the 
following year. 


So, an inspiring teacher 
helped set the course of 
modern algebra. 


379 


380 Chapter 9 Epilog 


The importance of nu- 
merical examples can’t be 
overestimated. 


In 1821, while in his final year at the University of Christiana, Abel thought 
he had proved that quintic polynomials are solvable by radicals, and he submit- 
ted a paper to the Danish mathematician Degen for publication by the Royal 
Society of Copenhagen. Degen asked Abel to give a numerical example of his 
method and, in trying to do this, Abel discovered a mistake in his paper. De- 
gen had also advised Abel to study elliptic integrals, and Abel wrote several 
important fundamental papers on the subject. In 1824, Abel returned to quin- 
tic polynomials, proving that the general quintic polynomial is not solvable by 
radicals. 

In 1825, having now done brilliant work in two areas of mathematics, the 
Norwegian government gave Abel a scholarship to travel abroad. He went to 
Germany and France, hoping to meet eminent mathematicians, but Gauss was 
not interested in Abel’s work on the quintic, and the mathematicians in Paris 
did not yet appreciate his remarkable theorems on elliptic functions. By 1827, 
Abel’s health deteriorated, he was heavily in debt, and he returned home to 
Norway. In 1828, he briefly returned to polynomials, proving a theorem de- 
scribing a class of polynomials (of any degree) that are solvable by radicals. 
By this time, Abel’s fame had spread to all mathematical centers. Legendre 
saw the new ideas in papers of Abel and of Jacobi, and he wrote 

Through these works you two ( Abel and Jacobi) will be placed in the 

class of the foremost analysts of our times. 

Strong efforts were made to secure a suitable position for Abel by a group from 
the French Academy, who addressed King Bernadotte of Norway-Sweden; 
Crelle also worked to secure a professorship for him in Berlin. But it was too 
late. Abel died in 1829, at age 26. 

An imprecise measure of Abel’s influence on modern mathematics is the 
number of areas named after him: abelian groups, abelian varieties, abelian 
differentials, abelian integrals, abelian categories, abelian extensions, abelian 
number fields, abelian functions. The Niels Henrik Abel Memorial Fund was 
established in 2002, and the Norwegian Academy of Science and Letters awards 
the Abel Prize for outstanding scientific work. 

Evariste Galois was born in Bourg La Reine, near Paris, in 1811. France, 
and especially Paris, was then in the throes of great political and social change 
as a consequence of the French Revolution in 1789, the Napoleonic era 1799- 
1815, the restoration of the French monarchy with King Louis XVIII in 1815, 
his overthrow by King Charles X in 1824, and another revolution in 1830. 

In April 1829, Galois' first mathematics paper (on continued fractions) was 
published; he was then 17 years old. In May and June, he submitted articles 
on the algebraic solution of equations to Cauchy at the Academy of Science. 
Cauchy advised him to rewrite his article, and Galois submitted On the condi- 
tion that an equation be solvable by radicals in February 1830. The paper was 
sent to Fourier, the secretary of the Academy, to be considered for the Grand 
Prize in mathematics. But Fourier died in April 1830, Galois’ paper was never 
subsequently found, and so it was never considered for the prize. July 1830 
saw another revolution. King Charles X fled France, and there was rioting in 
the streets of Paris. Later that year, Galois (now age 19) was arrested for mak- 
ing threats against the king at a public dinner, but he was acquitted. Galois 
was invited by Poisson to submit a third version of his memoir on equations to 
the Academy, and he did so in January 1831. On July 14, Galois was arrested 
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again. While in prison he received a rejection of his memoir. Poisson reported 
that “His argument is neither sufficiently clear nor sufficiently developed to al- 
low us to judge its rigor There is no good way of deciding whether a given 

polynomial ... is solvable.” He did, however, encourage Galois to publish a 
more complete account of his work. 

In March 1832, a cholera epidemic swept Paris, and prisoners were trans- 
ferred to boarding houses. The prisoner Galois was moved to a pension, where 
he apparently fell in love with Stephanie-Felice du Motel, the daughter of the 
resident physician. After he was released on April 29, Galois exchanged letters 
with Stephanie, and it is clear that she tried to distance herself from the affair. 
Galois fought a duel on May 30, the reason for the duel not being clear but 
certainly linked with Stephanie. A note in the margin of the manuscript that 
Galois wrote the night before the duel reads, “There is something to complete 
in this proof. I do not have the time.” It is this note which has led to the legend 
that he spent his last night writing out all he knew about group theory (but this 
story appears to have been exaggerated). Galois was mortally wounded in the 
duel, and he died the next day, only 20 years old. His funeral was the focus of 
a Republican rally, and riots lasting several days followed. 

According to Galois’ wish, his friend Chevalier and Galois’ brother Alfred 
copied Galois’ mathematical papers and sent them to Gauss, Jacobi, and others. 
No record exists of any comment these men may have made. Eventually, the 
papers reached Liouville who, in September 1843, announced to the Academy 
that he had found in Galois’ papers a concise solution “. . . as correct as it is 
deep, of this lovely problem: given an irreducible equation of prime degree, 
decide whether or not it is solvable by radicals.” Liouville published these 
papers of Galois in his Journal in 1846. What Galois outlined in these papers 
is called Galois Theory today. 

The following quotation is from the Epilog of Tignol’s book [33], 

After the publication of Galois’ memoir by Liouville, its importance 
dawned upon the mathematical world, and it was eventually realized that 
Galois had discovered a mathematical gem much more valuable than 
any hypothetical external characterization of solvable equations. After 
all, the problem of solving equations by radicals was utterly artificial. 

It had focused the efforts of several generations of brilliant mathemati- 
cians because it displayed some strange, puzzling phenomena. It con- 
tained something mysterious, profoundly appealing. Galois had taken 
the pith out of the problem, by showing that the difficulty of an equation 
was related to the ambiguity of its roots and pointing out how this ambi- 
guity could be measured by means of a group. He had thus set the theory 
of equations and, indeed, the whole subject of algebra, on a completely 
different track. 

We have chosen Fermat’s Last Theorem as an organizing principle of this 
book, but an interesting abstract algebra text could be written centered on 
group theory and roots of polynomials. 

9.2 Solvability by Radicals 

Informally, a polynomial is solvable by radicals if there is a formula for its 
roots that generalizes the classical quadratic, cubic, and quartic formulas. Let 
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us now examine the classical formulas to make this rather vague idea more 
precise. 


How to Think About It. Even though much of what we shall say applies 
to polynomials over any field, the reader may assume all fields coming up are 
subfields of the complex numbers C . We point out, however, that some famil- 
iar results may not be true for all fields. For example, the quadratic formula 
doesn’t hold in k[x] when k has characteristic 2 (for ^ doesn’t make sense in 
k); similarly, neither the cubic formula nor the quartic formula holds in k[x] 
when k has characteristic either 2 or 3. 


Definition. A field extension K/k is a pure extension if K = k(u), where 
u n e k for some n > 1. 

In more detail, K = k(u), where u is a root of x n — a for some a e k\ 
that is, u = k/a, and so we are adjoining an ;;th root of a to k. But there are 
several ;;th roots of a in C, namely 

kfd, ..., ^k/Ti, 

where £ = = cos(2 Jt/n) + i sm(2n/n) is a primitive nth root of unity. To 

avoid having to decide which one to adjoin, let’s adjoin all of them by adjoining 
any one of them together with all the ;;th roots of unity. This is reasonable, for 
we are seeking formulas for roots of polynomial equations that involve square 
roots, cube roots, etc., and roots of numbers appear explicitly in the classical 
formulas. 

Let’s consider the classical formulas for polynomials of small degree, for 
we’ll see that they give rise to a sequence of pure extensions. 

Quadratics 

If f(x) = x 2 + bx + c , then the quadratic formula gives its roots as 

\{—b ± s/b 2 — 4c). 

Let k = Q(b,c). Define K\ = k(u), where u = ~Jb 2 — 4c. Then K\ is a 
pure extension, for u 2 e k. Moreover, the quadratic formula implies that K\ is 
the splitting field of /. 

Cubics 

Let f(X ) = X 3 + bX 2 + cX + cl, and let k = Q(/t, c, d). The change of 
variable X = x — ^b yields a new polynomial f (x) = x 3 + qx + r e k[x] 
having the same splitting field E (for if u is a root of /, then u — ^b is a root 
of /); it follows that f is solvable by radicals if and only if / is. The cubic 
formula gives the roots of / as 

g + h, wg + w 2 h, and w 2 g + wh, 

where g 3 = i (— r + ~/R), h = —q/3g, R = r 2 + -£jq 3 , and ft) is a primitive 
cube root of unity. Because of the constraint gh = —\q, each choice of g = 

+ '/R) has a “mate,” namely h = —q/(3g), —q/(3cog) = w 2 h, and 
—q/(3(D 2 g) = ft)/;. 
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Define K\ = k(\[R), and define K 2 = Kfg ), where g 3 = j(—r + ~R ). 
The cubic formula shows that K 2 contains the root g + h of /, where h = 
—q/3g. Finally, define Kf = K 2 (o>), where w 3 = 1. The otherroots of / are 
cog + w 2 h and co 2 g + coh, both of which lie in W 3 , and so E C W 3 . 

Thus, a sequence of pure extensions seems to capture the notion we are 
seeking. We now give the formal definition of solvability by radicals, after 
which we will show that all polynomials of degree < 4 are solvable by radicals. 

Definition. A radical extension of a field k is a field extension K / k for which 
there exists a tower of field extensions 

k = K 0 c Ki c • • • c K, = K, 

where K, / Kj-\ is a pure extension for all i > 1. 

A polynomial fix) e k[x] is solvable by radicals if there is a splitting field 
E / k and a radical extension K/ k with E c K. 

Quadratics are solvable by radicals, and the cubic formula shows that every 
cubic f{x) = x 3 + qx + r e Q[x] is solvable by radicals: a radical extension 
containing a splitting field of / is 


Q(q, r) = Kq C. K\ = K 0 (co) c K 2 = KiiVR) c K 3 = K 2 {g), 


where we are using the notation in the cubic formula. 

Why do we say in the definition of solvable by radicals, that E c K instead 
of £ = KI That is, why don’t we say that some splitting field is a radical 
extension? The answer is that it isn’t. Consider the following theorem, due to 
Holder. 

Theorem 9.1 (Casus Irreducibilis). Let fix) e Q[x] be an irreducible cubic 
having three real roots. IfEC.<C is the splitting field of f and K is a radical 
extension ofQ containing E, then K C R. 

Proof. [25], p. 217. ■ 

If fix) € Q[x] is an irreducible cubic all of whose roots are real (/ (x) = 
3x 3 — 3x + 1 is such a cubic, by Example 6.58), then its splitting field £c|. 
We have just seen that / is solvable by radicals, so there is a radical extension 
K/Q with E C K. But the Casus Irreducibilis says that K CM. Therefore, 
E f K (because E C R); that is, the splitting field of / is not itself a radical 
extension. 

Here is a more remarkable consequence of the Casus Irreducibilis. In down- 
to-earth language, it says that any formula for the roots of an irreducible cubic 
in Q[.v] having all roots real requires the presence of complex numbers! After 
all, the formula involves a cube root of unity. In other words, it is impossible 
to “simplify” the cubic formula to eliminate i ; we must use complex numbers 
to find real roots! How would this have played in Piazza San Marco? 

We now show that quartic polynomials are solvable by radicals. 

Proposition 9.2. Every polynomial fiX) = X 4 + bX 3 + c X 2 + dX + e G 
Q[x] is solvable by radicals. 
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Q can be replaced by any 
field of characteristic 0. 


Proof. Let k = Q(b, c, d, e). The change of variable X = x — jb yields a 
new polynomial f (x) = x 4 + qx 2 + rx + sjz k[x\\ moreover, the splitting 
field E of / is equal to the splitting field of /, for if u is a root of /, then 
u — \b is a root of f . Factor f in C[x]: 

/ (x) = x 4 + qx 2 + rx + s = (x 2 + jx + f){x 2 — jx + m), 

and determine j , l, and m .Now j 2 is a root of the resolvent cubic. 

C/ 2 ) 3 + 2 q(j 2 ) 2 + (q 2 — 4s) j 2 — r 2 . 

The cubic formula gives j 2 , from which we can determine m and l, and hence 
the roots of the quartic. 

Define a radical extension 

k = K 0 c Ki c K 2 c K 3 , 

as in the cubic case, so that j 2 e K 3 . Define K 4 = K 3 (j) (so that t,m e K 4 ). 
Finally, define K 5 = K 4 ( f j 2 — 41) and K(, = K 5 ( \ff 2 — 4 m) , giving roots 
of the quadratic factors x 2 + jx +£ and x 2 — jx + m of / . The quartic formula 
gives E c Ke- Therefore, / is solvable by radicals. ■ 

Example 9.3. /(x) = x 5 — 1 e Q[x] is solvable by radicals. We know that 
/ (x) = (x — l)/z(x), where h is a quartic. But we have just seen that quartics 
are solvable by radicals. (Actually, Gauss proved that x" — 1 is solvable by 
radicals for all n > 1, and this led to his construction of the regular 17-gon by 
ruler and compass.) ▲ 

We have just seen that quadratics, cubics, and quartics in Q[x] are solvable 
by radicals. Conversely, let /(x) e Q[x] be a polynomial of any degree, and 
let E /<Q> be a splitting field. If / is solvable by radicals, we claim that there is 
a formula that expresses its roots in terms of its coefficients. Suppose that 


Q = ^0 c Ki c • • • c K t 


is a radical extension with E c K t . Let z be a root of /. Now z e K t = 
Kf-iiii), where u is an /nth root of some element a € K t - 1 ; hence, z can be 
expressed in terms of u and that is, z can be expressed in terms of n j/a 

and K t -\. But W r _ 1 = K t - 2(f), where some power of v lies in K,- 2 . Hence, 
z can be expressed in terms of w, v, and K t - 2- Ultimately, z is expressed by a 
formula analogous to the classical formulas. Therefore, solvability by radicals 
has now been translated into the language of fields. 

9.3 Symmetry 

Recognizing and exploiting symmetry is an important ingredient in geometry, 
algebra, number theory, and, indeed, in all of mathematics. 

Here is the basic idea: an object is symmetric if, when you transform it in a 
certain way, you get the same object back. For example, what do we mean 
when we say that an isosceles triangle A is symmetric? Figure 9.1 shows 
A = A ABC with its base AB on the x-axis and with the y-axis being the 
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perpendicular-bisector of AB . Close your eyes; pretend that the y-axis is a 
mirror, and let A be reflected in the y-axis (so that the vertices A and B are 
interchanged); open your eyes. You cannot tell that A has been reflected; that 
is, A is symmetric in the y-axis. On the other hand, if A were reflected in the 
x-axis, then it would be obvious, once your eyes are reopened, that a reflection 
had taken place; that is, A is not symmetric in the x-axis. 

Here is a non-geometric example: the polynomial / (x, y) = x 3 + y 3 — xy 
is symmetric because, if you transform it by interchanging x and y, you get 
the same polynomial back. Another example arises from g(x) = x 6 — x 2 + 3. 
This polynomial is symmetric because g(—x) = g(x); this symmetry induces 
symmetry of the graph of g(x) in the y-axis, for (— x, y) lies on the graph if 
and only if (x, y) does. 

The transformations involved in defining symmetry are usually permuta- 
tions. 

Definition. A permutation of a set X is a bijection a: X — > X. 

Here is a precise definition of symmetry in geometry. 

Definition. An isometry of the plane is a function (p: R 2 — > R 2 that is distance 
preserving', for all points P = (a. b) and Q = (c. d ) in R 2 , 

MP)-<p{Q) II = \\P-Ql 

where || P — Q || = ^ ( a — c) 2 + (b — d) 2 is the distance from P to Q. 

A symmetry of a subset £2 of the plane is an isometry a with <j(Q) = £2 
(by definition, cr(£2) = { a(u> ) : co e £2}). 

It is clear that every isometry is an injection, for if P ^ Q, then || P — Q || ^ 
0, hence ||cr(/ > ) — cr(<2)|| ^ 0, and a{P) ^ o{Q). It is also true (but harder 
to prove) that isometries are surjections ([26], p. 141). Thus, isometries are 
bijections; that is, they are permutations of the plane. 

Some figures have more symmetries than others. Consider the triangles in 
Figure 9.2. The first, equilateral, triangle has six symmetries: rotations by 120°, 
240°, 360° = 0° about its center, and reflections in each of the three angle 
bisectors. The second, isosceles, triangle has only two symmetries, the identity 
isometry and the reflection in the angle bisector, while the scalene triangle 
has only one symmetry, the identity isometry. A circle has infinitely many 
symmetries (for example, all rotations about its center). 


Reflection in the x-axis 
is the function (x,y) h-> 
{~x,y). 

Reflection in the x-axis 
is the function (x,y) h-»- 
(x-yl 


Now you see how im- 
portant the Pythagorean 
Theorem really is; it allows 
us to define distance. 
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Figure 9.2. Triangles. 


We now introduce symmetry in an algebraic setting. 

Definition. An automorphism of a commutative ring R is an isomorphism 
a: R — »• R. Given a field extension E/k, an automorphism o: E — >• E fixes k 
if a(a) = a for every a e k. 

The following theorem should remind you of Theorem 3.12 (which is the 
special case when E/k = C/M and a is complex conjugation). Of course, 
automorphisms are certain kinds of permutations. 

Theorem 9.4. Let k be a field , let f (x) e k\x\, and let E / k be a splitting 
field of f . If o\ E — » E is an automorphism fixing k, then o permutes the set 
£2 of all the roots of f . 

Proof. Let f (x) = a 0 + a \x + ■ ■ ■ + a n x n , where a ; € k for all /. If u G E 
is a root of /, then f (it) = 0 and 

0 = o{f{u)) 

= ct(ao + a\u + ■ ■ ■ + a n u n ) 

= ff(ao) + <j(ai)a(u) + ■ ■ ■ + o(a n )(y(u n ) 

= ao + a\o(u) + 1- a n a(u) n 

= f(ct(u)). 

Therefore, ct(m) is also a root of /, so that cr(£2) C that is, im(CT|f2) C 
and the restriction a \ is a function > f2. Buta|f2 is an injection, because 
a is, and the Pigeonhole Principle, Exercise A.l 1 on page 419, says that it is a 
permutation. ■ 

The following definition, due to E. Artin around 1930, modernizes and sim- 
plifies Galois’ original definition given 100 years earlier (it is equivalent to 
Galois’ definition). 

Definition. If k is a field, f(x) G k[x], and E/k is a splitting field of /, then 
the Galois group of / is 

Gal (/) = {automorphisms c'.E^-E fixing k}. 

Just as some triangles are more symmetric than others, some polynomials 
are more symmetric than others. For example, consider f(x) = x 2 — 2 and 
g(x) = x 2 — 9, where we consider both polynomials as lying in Q[x]. The 
splitting field of / is E = Q(V2), and there is an automorphism a'.E^-E 
that interchanges the roots V2 and — s/2, namely a:a + bV 2 m- a —bV 2. On 
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the other hand, the splitting field of g is Q, for both 3 and —3 lie in Q, and so 
Gal(g) consists only of the identity permutation. 

The astute reader may have noticed that Gal(/) really depends only on the 
fields k and E ; two polynomials in k[x] having the same splitting field have 
the same Galois group. For this reason, we usually write 

Gal (/) = Gal (E/k). 

Example 9.5. We show that not every permutation of the roots of a polyno- 
mial / is the restriction of some automorphism a e Gal (/). We saw in Ex- 
ample 3.7 that the roots of 

f(x) = x 4 — 10 x 2 + 1 e Q[x] 
are 

a = V2+V3, fi = V2-V3, y = -V 2+V3, 8 = -V 2 -VI 

Let E/Q be a field extension containing these four roots, and let n be the 
permutation that interchanges /I and y and fixes the other two roots: 

7t(a) = a , Jt(P) = y, n (y) = ft, jt{8) = 8. 

In E, we have a — p = 2\/3. Suppose there is an automorphism a of E with 
cr |{or, p, y, 5} = n. Then a(a — ft) = cr(2V3), and 

a{a — P) = a(a) — o(P ) = 7 r(o:) — n(P) = a — y = 2\/2. 

Hence, ct( 2\/3) = 2\/2. Square both sides: 

(i(2V3) 2 = (Isfl) 2 = 8. 

The left-hand side is <j(2\/3) 2 = cr((2\/3) 2 ) = cr (12) = 12, and this is a 
contradiction. Therefore, n £ Gal (/). ▲ 

An important class of symmetric polynomials are the elementary symmet- 
ric polynomials in n variables « i For two variables, the elementary 
symmetric polynomials are ai + a 2 and 0 : 10 : 3 . For three variables, they are 

51 = Ofl + Q'2 + 0:3, 

5 2 = 0:10:2 + 0:10:3 + 0120(3, 

53 = aia 2 a 3 

and, for n variables, ,v, is the sum of all products of the a,- , taken i at a time. 
We’ve met these before in the context of roots of polynomials. The coefficients 
of a monic polynomial are the elementary symmetric polynomials of its roots, 
with alternating sign, so that, for example, 

(x - Q!i)(x - a 2 )(x - 0 f 3 ) 

= x 3 — { a \ +0:2 + « 3 )x 2 + (0:10:2 + 0:10:3 + q? 2 Q: 3 )x — aio? 2 Q: 3 - 

It turns out that every symmetric polynomial in n variables can be expressed 
as a polynomial in the elementary symmetric polynomials s, (see [2], Chap- 
ter IIG). For example, 

or 2 + o?2 + Q?f = (oil + o?2 + 0:3 ) 2 — 2 (q?iq:2 + 0:10:3 + 0:10:3) 

= s 2 — 2 s 2 . 
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As usual, co = 
2 ( — 1 + i V3). 


A CAS is a great help here 
(See Appendix A. 6). 


Exercise 9.1 shows how 
to derive the quadratic 
formula without completing 
the square. 


Example 9.6. The elementary symmetric polynomials can be used to give al- 
ternate derivations of the quadratic and cubic formulas (Exercises 9.1 and 9.3 
below). Let’s sketch a derivation of the cubic formula along these lines. 
Assume our cubic has been reduced and is, as before, of the form 

x 3 + qx + r. 

Suppose further that we let its roots be ai, a 2 and a. 3 . Then we know that 

o?i + a 2 + o?3 = 0, 
aia 2 + aiQ?3 + a 2 oi3 = q , 

aia 2 oi3 = —r. 


Form the two expressions 5 and u: 

s = o?i + 0 t 2 (V + OL 3 U ) 2 
u = a i + a 2 a > 2 + »3 to. 

So, we have three expressions in the roots 

0 = Q?i + 0?2 + 0?3 

s = o?i + a 2 a> + OL 300 2 
u = a 1 + a 2 a > 2 + q? 3&>. 

Adding the equations, we see that s + u = 3o?i. Hence, if s and u can be 
expressed in terms of q and r, then a\ can be so expressed (and, by symmetry, 
the other roots can be expressed in terms of q and r). Experimenting with a 
CAS or by hand, we find that 

su = a\ + a\ + a 2 — a\a 2 — q?iq ;3 — a 2 ot 3 

= (a\ + u 2 + o?3 ) 2 — 3 {oi\oi 2 + aia 3 + a 2 a 3 ) 

= 0 — 3 q = —3 q. 

Expanding s 3 + m 3 and factoring the result, we get 
s 3 +u 3 

= -(ai + a 2 - 2a 3 )(a!i + a 3 - 2a 2 )(a!2 + a 3 - 2a 1) 

= — (ai + a 2 + 013 — 3a3)(ai + a 2 + a 3 — 3a 2 )(ai + a 2 + a 3 — 3ai) 
= — (— 3a 3 )(— 3a 2 )(— 3ai) 

= 27aia 2 a3 = — 27r. 

From su = —3 q, we get s 3 u 3 = —21q 3 . Coupled with .s 3 + u 3 = —21r, we 
see that s 3 and u 3 are roots of the quadratic polynomial 

x 2 + 27rx — 27 q 3 . 

We can solve this for s 3 and m 3 , take cube roots, and recover ai, leading to 
Cardano’s formula (Exercise 9.3 below). ▲ 

Exercises 

9.1 * Suppose the roots of x 2 + bx + c are a and /8. Find, without using the quadratic 
formula, an expression for and a — /S in terms of a + /S and afi . Use it and the 
fact that a + /i = —b to find a in terms of b and c. 
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9.2 Find the roots of the cubic from page 83, 

x 3 - 18x — 35, 

using the method of Example 9.6. 

9.3 * Finish the derivation of the cubic formula outlined in Example 9.6. 

9.4 Groups 

Galois invented groups to exploit symmetry. Our purpose here is only to dis- 
play Galois’ ideas in enough detail so that Theorem 9.16 below is plausible; 
we wish to dispel some of the mystery that would arise if we merely cited the 
ultimate result (you can follow the proofs in [26], Chapter 5). 

Commutative rings are sets with two binary operations; a group is a set 
having only one binary operation. Permutations, as any functions from a set 
X to itself, can be composed and, as we show in Appendix A.l, composition 
equips the family of all permutations of X with a a binary operation. This 
viewpoint begets a kind of algebra, called group theory. 

Definition. A group is a nonempty set G with a binary operation 

*: G x G — » G, 

where *: (a,b) i->- a * b, satisfying the following properties: 

(i) (a * b) * c = a * (b * c) for all a,b,c e G, 

(ii) there is e e G with e*a = a = a*e for all a e G, 

(iii) for all a e G, there is a' e G with a' *a = e = a*a'. 

The element e is called the identity of G, and the element a' is called the 
inverse of a (the inverse of a is usually denoted by a -1 ). 

It is not difficult to prove, for groups as for commutative rings, that the 
identity element is unique (if e' * a = a = a * e' for all a e G, then e' = e), 

and the inverse of every element is unique (if a” * a = e = a * a ", then 

a" = a'). 

Example 9.7. Theorem A. 12 in Appendix A.l shows that Sx, the family of 
all the permutations of a nonempty set X, is a group with composition as its 
binary operation. In the special case when X = {1,2,..., n}, denote Sx by 


Sn , 


and call it the symmetric group on n letters. ▲ 

Example 9.8. Just because we call a Galois group a group doesn’t make it so. 
Recall that the Galois group Gal(is / k) of a field extension E / k consists of all 
the automorphisms a of E that fix k . We now show that Gal (E / k) with binary 
operation composition is a group. 

If ct, r e Gal (E/k), then their composite rcr is an automorphism of E 
fixing k\ that is, tct e Gal (E/k), so that composition is a binary operation 
on Ga l(E/k). Proposition A. 5 says that composition of functions is always 
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associative. The identity 1 e'E — > £ is an automorphism fixing k, and it is 
a routine calculation to show that if a e Gal(£ / k), then its inverse ct ^ 1 also 
lies in Ga l(E/k). In particular, if f(x) e k\x\ and E/k is a splitting field 
of /, then Gal(/) = Gal(£ / k) is a group. We have assigned a group to every 
polynomial. ▲ 

Example 9.9. The set G = GL(2,R) of all nonsingular 2 x 2 matrices with 
real entries is a group with binary operation matrix multiplication fi. First, we 
have fi: G x G — > G because the product of two nonsingular matrices is also 
nonsingular. Matrix multiplication is associative, and the identity matrix [ q ? ] 
is the identity element. Finally, nonsingular matrices have inverses (this is the 
definition of nonsingular!), and so G is a group. ▲ 


How to Think About It. We warn the reader that new terms are going to 
be introduced at a furious pace. You need not digest everything; if a new idea 
seems only a little reasonable, continue reading nevertheless. One way to keep 
your head above water is to see that definitions and constructions for groups 
( subgroups , homomorphisms , kernels, normal subgroups, quotient groups ) are 
parallel to what we have already done for commutative rings (subrings, ho- 
momorphisms, kernels, ideals, quotient rings). Your reward will be a better 
appreciation of the beautiful results of Abel and Galois. 


Definition. A subgroup of a group G is a nonempty subset Sc G such that 
s,t e S implies s * t € S and s € S implies i _1 e S. 

Subgroups S C G are themselves groups, for they satisfy the axioms in 
the definition. In particular, since S is not empty, there is some s e .S’; by 
definition, its inverse 5 _1 also lies in S, and so e = s * s~ x lies in S. 


Food for Thought. If £2 is a set, we may view any subgroup of Sn, the 
group of all permutations of £2, as symmetries of it. The notion of symme- 
try depends on the permutation group: isometries of the plane are one kind 
of symmetry; another kind arises from the group of all homeomorphisms of 
the plane; yet another arises from the group of all nonsingular linear transfor- 
mations. This observation is the basis of Klein’s Erlanger Programm, which 
classifies different types of geometries according to which geometric proper- 
ties of figures are left invariant. 


Multiplication in a commutative ring R is, by definition, commutative: if 
a,b € R, then ab = ba. But multiplication in a group need not be commuta- 
tive: a * b and b * a may be different. For example, composition in the sym- 
metric group S3 is not commutative: define o, r G S3 by cr(l) = 2, cr(2) = 1, 
ct( 3) = 3, and r(l) = 1, r(2) = 3, r(3) = 2. It is easy to see that ao z / roa 
(for crr(l) = ct( 1) = 2 and rcr(l) = r(2) = 3). 

You also know that the product of two matrices depends on which is written 
first: AB 7 C BA is possible. Hence, the group GL(2, R) is not commutative. 


Definition. A group G is abelian if a * b = b * a for all a,b € G. 
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Abel’s 1828 theorem says, in modem language, that a polynomial with an 
abelian Galois group is solvable by radicals; this is why abelian groups are so- 
called. From now on, we shall simplify notation by writing the product of two 
group elements as ab instead of by a * b and the identity as 1 instead of by e. 

Proposition 9.10. If k is a field, a G k, and k contains all the nth roots of 
unity, £, £ 2 if 1 = 1, then the Galois group of f(x) = x n — a is abelian. 

Sketch of proof. Since k contains all the nth roots of unity, 

E = k (&)' f : f n = a and 1 < i < n) 

is a splitting field of /. Any automorphism a of E must permute the roots, 
and so a{o)' [i ) = co J f> for some j depending on i . Similarly, if r is another 
automorphism, then r {co J ft) = co^fi. It follows that both ox and to send w' ft 
to of +i fi = co l+ j from this fact it is not hard to see that Ga \{E / k) is an 
abelian group. ■ 

Galois was able to translate a polynomial being solvable by radicals into a 
certain property of its Galois group by constructing analogs for groups of the 
constructions we have done in earlier chapters for commutative rings. 

Definition. If G and H are groups, then a homomorphism is a function 
cp:G — > H such that, for all a, b € G, 

< p(ab) = (p(a)(p(b). 

An isomorphism is a homomorphism that is a bijection. If there is an isomor- 
phism cp: G —>■ H , then we say that G and H are isomorphic and we write 
G ss H. 

We can be more precise. If groups are denoted by (G, *) and ( II. o), where 
* and o are binary operations, then a homomorphism <p: G — » H is a function 
for which 


<p(a * b) = cp(a ) o (p(b). 

It is easy to see that if tp is a homomorphism, then cp( 1) = e, where 1 is the 
identity of G and e is the identity of II ; moreover, for each a e G, we have 
p(ci~ x ) = (p(ct)~ x (the latter being the inverse of the element cp(a) in H). If 
a, b € G commute, then ab = ba. Hence, if (p: G — >• H is a homomorphism, 
then 


< p(a)(p(b ) = < p(ab) = <p(ba) = (p{b)(p{a). 

It follows that if G is abelian and tp is an isomorphism, then H is abelian. 
Every polynomial determines a group of symmetries of its roots. 

Theorem 9.11. If a polynomial f(x) e k[x\ has n roots, then its Galois group 
Gal(£ / k ) is isomorphic to a subgroup of the symmetric group S n . 

Proof. By Theorem 9.4, elements of Gal ( E / k) permute the roots of /. Now 
see [26], p. 454. ■ 


There is always a primitive 
n\h root of unity; that 
is, an element f with 
= 1 such that every 
nth root of unity is a 
power of f In particular, 
a primitive complex nth 
root of unity is e lnl/n = 
cos(2;r/n) + i sin(2;r/n). 


Of course, abstract algebra 
did not exist in Galois’ time. 
In particular, rings, fields, 
and homomorphisms were 
not in anyone’s vocabulary; 
nor were groups. 


If X and Y are sets 
of n elements, then 
S x = S n ^ S Y ■ Thus, 
groups don’t care if you 
are permuting n numbers, 
n roots, or n monkeys. 
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Definition. The kernel of a homomorphism (p: G — > H of groups is 

kerip = {a G G : <p(a) = 1}, 

where 1 is the identity element of H . The image of <p is 

irnip = {h G H : h = <p(g) for some g g G}. 

If (p: G — > H is a homomorphism, then kerip is a subgroup of G and im <p 
is a subgroup of //. 

Just as the kernel of a ring homomorphism has special properties — it is an 
ideal — so, too, is the kernel of a group homomorphism special. If a G ker <p 
and /) e C, then 

< p(bab~ l ) = (p{b)(p{a)(p{b~ l ) 

= (p(b)l(p(by 1 
= <p(b)<p(b)- 1 = 1 . 

Definition. A subgroup A of a group G is a normal subgroup if, for each 
a G A , we have bab ~ 1 € A for every b € G. 

Thus, kernels of homomorphisms are always normal subgroups. In an abelian 
group, every subgroup is normal but, in general, there are subgroups that are 
not normal. For example, if a is the permutation of A = {1,2,3} that inter- 
changes 1 and 2 and fixes 3, then S = {e, a} is a subgroup of the symmetric 
group S 3 . But S is not a normal subgroup of S 3 , for if r is the permutation that 
fixes 1 and interchanges 2 and 3, then r(rr _ 1 (l) = rcr(l) = r(2) = 3. Hence, 
Terr -1 ^ e and rcrr -1 7 ^ cr; that is, ct G S but rcrr ^ 1 $£ S. 

Just as we used ideals to construct quotient rings, we can use normal sub- 
groups to construct quotient groups. If A is a subgroup of a group G, define 
certain subsets, called cosets of A in G, as follows: if a G G, then 

aN = {as : s g A} c G. 

The family of all cosets of A is denoted by 

G/A = {all cosets aN : a G G}. 

When A is a normal subgroup, then G/A is a group if we define a binary 
operation by 

aN * bN = abN 

(normality of A is needed to prove that this multiplication is well-defined: if 
a'N = aN and b' N = bN, then a'b'N = abN). The group G/A is called 
the quotient group. 

There is an isomorphism theorem for groups analogous to the isomorphism 
theorem for commutative rings. 

Theorem 9.12 (First Isomorphism Theorem). If <p: G — > H is a group ho- 
momorphism, then im < p is a subgroup of H, A = ker < p is a normal subgroup 
of G, and there is an isomorphism O: G/ A — » im <p given by <I>: aN i->- cp(a ). 
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Sketch of proof. Adapt the proof of the First Isomorphism Theorem for com- 
mutative rings. ■ 


How to Think About It. Without a doubt, this section contains too much 
new material; there’s too much to digest. Fortunately, you have seen analogs 
of these definitions for commutative rings so, at least, they sound familiar. You 
can now sympathize with the members of the Academy in Paris in 1830 as 
they struggled, without benefit of ever having seen any abstract algebra at all, 
to read such things in the paper Galois submitted to them! 


Let us now see why normal subgroups are important for polynomials. If 
k c B c E, then Ga 1(E / B) is a subset of Gal(£ / k ): 

Gal (E/B) = {ct € Gal (E/k) : o(B) = B }. 

It is easy to check that Ga \(E/B) is a subgroup of Gal (£ / k ). 

Theorem 9.13. Let k C B C E be a tower of fields, where E / k is a splitting 
field of some f(x) € k[x\. If B is the splitting field of some g(x) G k[x\, then 
Gal(is / B) is a normal subgroup of Gal (E / k), and 

G<d(E/k)/GaI(E/B) =* Gal (B/k). 

Sketch of proof. Define tp:Gal{E / k) — > Ga 1{E/B) by tp\o i->- o\B. The 
restriction o \ B does send B to itself, since B is a splitting field (so automor- 
phisms permute the roots of g, by Theorem 9.4), and so o\B e Gal (B/k). 
By Theorem 9.12, the First Isomorphism Theorem, it suffices to find im <p and 
ker <p. It is obvious that Gal (E/B) C kextp, and a short calculation gives equal- 
ity. We can prove that < p is surjective and hence im <p = Gal (B/k) (the proof 
of surjectivity ([26] p. 455) is not straightforward). ■ 

The converse of Theorem 9.13 is true: the Fundamental Theorem of Galois 
Theory says, in part, that if A is a normal subgroup of Gal ( E / k), then there 
is a subfield 


B = {a e E : o(a) = a for all o e N} c E 
that is a splitting field of some polynomial in k[x\. 


How to Think About It. 

The subgroups get smaller as the field extensions get bigger. If K C L C 
E, then Gal (E / L) C Gal (E/K): if ct is an automorphism of E that fixes 
everything in L, then surely ct fixes everything in K C L. 


Lemma 9.14. Let K c L C E be a tower of fields, where K contains all roots 
of unity. If L/ K is a pure extension, then Gal ( E / L) is a normal subgroup of 
Gal (E/K) and the quotient group Ga 1(1?/ K)/Gal(E / L) is abelian. 

Sketch of proof. The field extension L/ K is a splitting field (because the 
subfield K contains all needed roots of unity), and so Theorem 9.12 gives 


It is not true that that if 
A is a normal subgroup 
of B and B is a normal 
subgroup of C, then A is a 
normal subgroup of C. 
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Gal(is / L) a normal subgroup of Gal ( E / K). By Proposition 9.10, the quotient 
group is abelian. ■ 

We conclude the story by applying Lemma 9.14 to each pure extension 
Kj / Kj -i in a tower of a radical extension. 

Lemma 9.15. Let k be a field containing all roots of unity. If fix) e k[x\ is 
solvable by radicals, then there is a chain of subgroups 


Go = Gal (K,/k) 2 G x 2 G 2 2 ■■■ 2 G, = {1}, 


where each G /+ \ is a normal subgroup ofG\ and each quotient group Gi / Gj+i 
is abelian. 

This lemma suggests the following definition. 

Definition. A group G is solvable if there is a chain of subgroups 

G = Go 7) G\ 7) G2 5 5 Gt = {1} 

where each G, +1 is a normal subgroup of G, and each quotient group G/ / G/+ 1 
is abelian. 

Clearly, every abelian group is solvable — take G\ = {1}. It is shown in 
[26], p. 466, that the symmetric group S 4 and all its subgroups are solvable 
groups, but that S 5 is not solvable. 

Using these ideas, Galois proved the following beautiful theorem. 

Theorem 9.16 (Galois). Let k be afield and f (x) € k [x]. If f is solvable by 
radicals, then its Galois group is a solvable group. If k has characteristic 0, 
then the converse is true. 


Proof. [25], p. 189 and p. 208. ■ 

Galois’ Theorem explains why the classical theorems hold for polynomials 
of degree < 4. 

Corollary 9.17. Ifk is afield of characteristic 0 and /(x) G k [a] has degree 
< 4, then f is solvable by radicals. 


Ruffini’s name occurs here 
because he published 
a proof of this result in 
1799. Although his ideas 
were correct, there were 
gaps in his proof, and it 
was not accepted by his 
contemporaries. 


Proof. Since deg(/ ) < 4, Theorem 9.11 says that Gal(/) is (isomorphic to) a 
subgroup of S 4 and, hence, it is a solvable group. Theorem 9.16 now says that 
/ is solvable by radicals. ■ 

Finally, the next theorem explains why degree 5 was so troublesome. 

Corollary 9.18 (Abel-Ruffini). The general polynomial of degree 5 is not 
solvable by radicals. 

Proof. The Galois group of the general quintic /(x) e <Q[x] is S 5 ([26], 
p. 468), which is not a solvable group, and so Galois’ Theorem says that / 
is not solvable by radicals. ■ 
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Here is an explicit numerical example. The quintic 
/ ( x ) = x 5 — 4x + 2 € Q[a] 

(see Figure 9.3) is not solvable by radicals because its Galois group is S 5 ([26], 
p. 469). 



Figure 9.3. f(x) = x 5 - 4x + 2. 

Corollary 9.18 is often misquoted. It says the general quintic is not solvable 
by radicals: there is no formula involving only addition, subtraction, multipli- 
cation, division, and extraction of roots that expresses the roots of the general 
quintic polynomial in terms of its coefficients. But it doesn’t say that roots of 
quintics cannot be found. There are other kinds of formulas; for example, New- 
ton’s method gives the roots as lim„^oo x n , where x n +\ = x n —f(x n )/ f'(x n ). 
Thus, it is not accurate to say that there is no formula finding the roots of a 
quintic polynomial. 

Exercises 

9.4 Prove that every subgroup of an abelian group is abelian. 

9.5 Let / (x), g(x) e Q[x] be solvable by radicals. 

(i) Show that /(x)g(x) is also solvable by radicals. 

(ii) Give an example showing that / (x) + g(x) need not be solvable by radicals. 

9.6 Assuming that x” — 1 is solvable by radicals, prove that x n — a is solvable by 
radicals, where a e Q. 

9.7 Prove that S3 is a solvable group and that it is not abelian. 

9.8 Recall Exercise 1.56 on page 35: if m > 2 is an integer, gcd (k,m) = 1, and 
gcd = 1, then gcd(kk', m) = 1. 

Prove that 

U m = {[£] 6 Z m : gcd(T, m) = 1} 
is a group under multiplication. 

9.9 If k is a field, prove that k x = {a &k : a ^ 0} is a group under multiplication. 
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GL 2 (L) is called the 
General Linear group, and 
SL 2 (fc) is called the Special 
Linear group . 


The “laws of exponents” 
from high school algebra 
preview the results of 
Exercise 9.13. 


9.10 If R is a commutative ring, prove that R is an abelian group under addition. (Note 
that 0 is the identity element and that —a is the (additive) inverse of a.) 

9.11 Let k be a field. 

(i) Prove that k x is an abelian group under multiplication, where k x denotes the 
set of nonzero elements of k. 

(ii) Prove that GL 2 (fc), the set of all 2 x 2 nonsingular matrices with entries in k, 
is a group under matrix multiplication. 

(iii) Prove that the determinant function, 

det: GL 2 (fc) k x , 


(iv) 


(v) 

9.12 (i) 

(ii) 

(iii) 

9.13 (i) 

(ii) 

(iii) 

9.14 (i) 


(ii) 

(iii) 


is a surjective homomorphism of groups. 

Prove that ker(det) = SL 2 (fc), the set of all 2 x 2 matrices over k having 
determinant 1. 

Prove that GL 2 (A.')/SL 2 (fc) ^ k x . 

Prove that R is an abelian group with addition as binary operation. 

Prove that Q is an abelian group with addition as binary operation; indeed, it 
is a subgroup of R. 

Let R > be the group of positive real numbers. Show that R > is a group with 
addition as as binary operation. 

Prove that exp: R R > , defined by a h* e a . is a group homomorphism. 
Prove that log: R > -*■ R, defined by b m>- logi, is a group homomorphism. 
Prove that exp is an isomorphism by showing that its inverse is log. 

Prove that R > , the set of all positive real numbers, is an abelian group with 
multiplication as binary operation, and prove that Q > , the set of all positive 
rational numbers, is a subgroup of R > . 

Prove that Z[x ] is an abelian group under addition. 

Use the Fundamental Theorem of Arithmetic to prove that the additive group 
Z[x ] is isomorphic to the multiplicative group Q > of all positive rational 
numbers. 

Hint. Define i p: Z[x] —r Q > by 

<P'-e o +eix H \-e n x n p e Q ° p e f ■■■p e n " , 

where po = 2, p\ = 3, P 2 = 5, . . . is the list of all primes. 


9.5 Wiles and Fermat’s Last Theorem 

Andrew Wiles proved Fermat’s Last Theorem in 1995: Modular elliptic curves 
and Fermat’s last theorem, Ann. Math. (2) 141 (1995), pp. 443-551. He has 
said, 

I was a ten year old and one day I happened to be looking in my local 
public library and I found a book on maths and it told a bit about the 
history of this problem and I, a ten year old , could understand it. From 
that moment I tried to solve it myself it was such a challenge, such a 
beautiful problem. This problem was Fermat’s Last Theorem. 


and 
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There’s no other problem that will mean the same to me. I had this very 
rare privilege of being able to pursue in my adult life what had been my 
childhood dream. I know it’s a rare privilege, but I know if one can do 
this it’s more rewarding than anything one can imagine. 

Andrew Wiles was bom in Cambridge, England in 1953. He was awarded 
a doctorate in 1980 from the University of Cambridge, then spent a year in 
Bonn before joining the Institute for Advanced Study in Princeton. In 1982, 
he was appointed Professor at Princeton. Around 1985, Wiles learned that the 
Shimura-Taniyama-Weil conjecture about elliptic curves , if true, would im- 
ply Fermat’s Last Theorem (we will say more about this below). Wiles was 
able to prove a special case of this conjecture (the full conjecture was proved 
in 2001) which was strong enough to give Fermat’s Last Theorem. 

In 1994, Wiles was appointed Eugene Higgins Professor of Mathematics at 
Princeton. Wiles received many honors for his outstanding work. For exam- 
ple, he was awarded the Schock Prize in Mathematics from the Royal Swedish 
Academy of Sciences, the Prix Fermat from Universite Paul Sabatier, the Wolf 
Prize in Mathematics from the Wolf Foundation in Israel, and the Cole Prize 
from the American Mathematical Society. He was elected a member of the 
National Academy of Sciences of the United States, receiving its mathematics 
prize, and Andrew Wiles became “Sir Andrew Wiles” when he was knighted 
by the Queen of England. In 1998, not being eligible for a Fields medal (the 
mathematics prize equivalent to a Nobel prize) because he was over forty 
years of age, the International Mathematical Union presented him with a silver 
plaque at the International Congress of Mathematicians. 


Elliptic Integrals and Elliptic Functions 

The context of Wiles’ proof of Fermat’s Last Theorem is that of elliptic curves, 
an area with an interesting history. Leibniz, one of the founders of calcu- 
lus, posed the problem of determining which integrals could be expressed in 
“closed form;” that is, as linear combinations of familiar functions such as ra- 
tional functions, exponentials, logarithms, trigonometric functions, and their 
inverse functions. One of the first integrals that could not be so expressed (al- 
though the proof of this fact, by Liouville, waited until 1833) is the arclength 
of an ellipse. If f(x,y) = 0 is the equation of a curve in the plane, then its 
arclength is given in terms of the indefinite integral 

J \J 1 + (dy/dx ) 2 dx. 

Consider the ellipse with equation 

2 2 

^ + = i 

a 2 b 2 

where a > b > 0. We have y = b s/l — ( x 2 /a 2 ), so that 
dy —bx 

dx a 2 ^j\ — {x 2 la 2 ) 
and the arclength integral is 


1 

a 


C la 4 — (a 2 — b 2 )x 2 

J V ~a 2 ^7 2 


We refer the reader to 
the books of Siegel [30], 
Silverman-Tate [31], and 
Stillwell [32], as well 
as to the article by M. 
Rosen, Abel’s Theorem 
on the Lemniscate, Amer. 
Math. Monthly 88 (1981), 
pp. 387-395, for further 
details of this discussion. 
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The eccentricity of the ellipse is 

E = y/1 - ( b/a ) 2 . 

Make the substitution x = a sin 9, so that cos 9 = ^ Vo 2 — x 2 and dx = 
a cos 9 cl9, to obtain 



Finally, we rewrite the last integral using the tangent half-angle formula t = 
tan((9/2) in Chapter 1 (so that d9 = 2dt /{\ + t 2 ) and sin0 = 2//(l + t 2 )). 
We obtain 


a [ \/ 1 — E 2 sin 2 9 d9 = 2 a [ ^ ■ dt, 

J J (1 +t 2 ) 2 

where g(t) = t 4 + (2 — AE 2 )t 2 + 1. Thus, if R{x, _y) is the rational function 
in two variables, 


R(x,y) 


y 


(1 + X 2 ) 2 ’ 

then the arclength of an ellipse has the form 


2 a 


J R(t, VsiO) 


dt. 


where g(t) is a quartic polynomial. A similar integral arises from the arclength 
of the hyperbola x 2 /a 2 — y 2 /b 2 = 1. 


Definition. An elliptic integral is an indefinite integral of the form 

J R(t, vm*. 


The substitution t = l/u 
transforms the cubic 
integrand 


dt 

(t — a)(t — b)(t — c) 
into the quartic 


—du 

s/u{l — ua){\ — ub)(i — uc) 


where R(x, y) is a rational function and git) is either a cubic or a quartic 
polynomial having no repeated roots. 

These integrals are so called because, as we have just seen, the arclength 
of an ellipse was one of the first examples of them. Another example of an 
elliptic integral, studied by Jacob Bernoulli in 1679, arises from computing the 
arclength of a spiral. In 1694, James Bernoulli examined the shape an elas- 
tic rod takes if its ends are compressed; he found the resulting curve to be 
the lemniscate r 2 = cos 20; see Figure 9.4 (there are eight mathematicians 
in the Bernoulli family, in the seventeenth and eighteenth centuries, listed in 
the MacTutor History of Mathematics Archive). Recall that the arclength of a 
curve r = f (0) in polar coordinates is 


\/l + r 2 (d9/dr) 2 d r. 
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If r 2 = cos 29, then 


1 + r 2 (dd/dr) 2 = 1 + - 


= 1 + 


1 + 


sin 26 
r 4 

1 — cos 2 26 


1 — r 4 


1 — r 4 


Therefore, the arclength of the lemniscate 
Yet another example, 

dt 


. f d r 

18 1 7T^ 

( dt 
J y/J\ - t 2 )( 1 - at 2 ) 


arises when calculating the period of oscillation of a simple pendulum. Com- 
puting the electrical capacity of an ellipsoid with equation x 2 /a 2 + y 2 /b 2 + 
z 2 /c 2 = 1 involves the integral 


/ 


dt 

yj(a 2 + t)(b 2 + t)(c 2 + t ) 


an elliptic integral involving a cubic. 

Since there are interesting elliptic integrals and they are difficult to evalu- 
ate, they were the subject of much investigation. In 1718, Fagnano proved a 

Duplication Formula'. 


2 



dt 

Vl -t 4 



dt 

Vl -t 4 ' 


where{2(w) = 2 m Vl — u 4 / (1+w 4 ). In proving this, Fagnano inverted I (x) = 
Jq 1 / V 1 — t 4 dt, getting the inverse function 7 _1 (x) = V2x/Vl + x 4 . In 
1751, Euler generalized the duplication formula of Fagnano, obtaining an 
Addition Theorem: 



dt 

Vl -t 4 



dt 

Vl -t 4 


L 


P(u,v ) 


dt 

Vl -t 4 ' 


where P{u,v) = (mV 1 — v 4 + uVl — m 4 )/(1 + u 2 v 2 ) (so that P{u, u) = 
Q{u)). Euler further generalized this by replacing the integrand by 1/ yj p(t), 
where p(t) is any quartic polynomial. 
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In 1797, Gauss considered the elliptic integrals -^== and —j==. 
He saw an analogy (as, most likely, did Fagnano and Euler) with 

r dt 

sin u = I — . 

Jo l 2 

and he inverted many elliptic integrals; after all, sinx is the inverse function 
of sin” 1 x. Nowadays, inverse functions of elliptic integrals are called elliptic 
functions. Just as sinx is periodic, that is, sin(x + 2: r) = sinx for all x, so, 
too, are elliptic functions / ; there is some number p with /(x + p ) = /(x) 
for all x. Gauss then studied complex elliptic integrals I(z) = j a their 

inverse functions f(z) = / _ 1 (z) are called elliptic functions of a complex 
variable. Gauss saw that complex elliptic functions are doubly periodic : there 
are (noncollinear) complex numbers p and q with 

f(z + mp + nq)= f(z ) 

for all complex z and all m, n e Z. This fact has important geometric conse- 
quences, both for elliptic functions and for complex variables in general. Alas, 
Gauss never published these ideas, and they became known only later. 

In 1823, Abel investigated elliptic functions, rediscovered many of Gauss’s 
theorems, and proved new beautiful results about them. For example, just as 
Gauss had found all n for which one can divide the circle into n equal arcs 
using ruler and compass (n = 2 m p\ ■ ■ ■ pk, where m > 0 and the p\ are dis- 
tinct primes of the form 2 +1), Abel obtained the same result (for exactly 

the same n) for the lemniscate. At the same time, Jacobi began his investiga- 
tions of elliptic functions, further explaining and generalizing work of Euler 
by introducing theta functions and modular curves. 

Congruent Numbers Revisited 

The search for congruent numbers can be viewed as the search for “general- 
ized Pythagorean triples” — right triangles with rational side-lengths and inte- 
ger area. Recall from Chapter 1 that a congruent number is a positive integer n 
that arises from asking which integers are areas of right triangles having ratio- 
nal side-lengths; that is, there are positive rational numbers a, b, and c such 
that 


Since ab = 2 n > 0, we 
have a f 0 and b f 0. It 
follows that c f 0, too. 


In Theorem 1.9, we 
reduced the defining pair 
(for n = 2) to a degree 4 
equation in three variables. 
We’ll do a little better here. 


a 2 + b 2 = c 2 and 
j ab = n. 

Let’s loosen the constraints a bit and allow a, b, and c to be negative rational 
numbers as well. We’d like to replace the two equations in four unknowns with 
a simpler set of constraints. We’ll see that the solution can be realized as the 
search for rational points on a polynomial curve. 

We now turn the pair of defining equations into a single equation in two 
variables. The equation a 2 + b 2 = c 2 can be written as 

b 2 = c 2 — a 2 = (c — a)(c + a). 

Let k = c — a, so that we have 


b 2 = k(c + a). 
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Since c = k + a, this is the same as 

b 2 = k(k + 2a) = k 2 + 2 ak, 
or 

2 ak = b 2 — k 2 . 

Since a = 2 b/n, this is equivalent to 



Ank = b 3 — k 2 b. 


This is beginning to look like a cubic in b. To homogenize it, multiply both 
sides by (^-) 3 to get 



remembering that k = c — a, we have a single cubic equation satisfied by a, 
b, c, and 



This shows that if a 2 + b 2 = c 2 and ab = 2 n, then 

bn 2 n 2 


c — a c — a 


is a rational point on the graph of y 2 = x 3 — n 2 x (the graph of y 2 = x 3 — 25 x 
is shown in Figure 9.5). 

Except for sign changes and points on the x-axis, the correspondence goes 
both ways. 


We’re assuming that 
c — a 0, or else b =0. 





402 Chapter 9 Epilog 


Theorem 9.19. Let n be a positive integer. There ’s a bijection between triples 
of rational numbers (a, b, c) satisfying 

a 2 + b 2 = c 2 and ab = 2n 

and rational points on the graph of y 2 = x 3 — n 2 x with y 0. 

Proof. The calculation preceding the statement of the theorem shows that a 
triple produces such a point on the graph. Going the other way, if ( x , y ) is a 
point on the graph with y f 0, we can solve the system 

bn 
c — a 
In 2 
c — a 
a 2 + b 2 

for a, b, and c, either by hand or CAS (see Figure 9.6) to find 

2 2 
x — n 

a = 

y 

1 2 nx 

y 

x 2 + n 2 
c = . 

y 

It is easily checked that this produces a triple of rational numbers of the desired 
type. ■ 



The first part of what 
the CAS returns (for 
(x 2 - cy - n 2 )/y ± 0) 
combined with the value 
given for c, just say that 
n 0. 



Let’s state the conversion formulas explicitly. 


Corollary 9.20. The bijection guaranteed by Theorem 9.19 is given explicitly 
by 


{a, b, c ) i 


bn 2 n 7 


c — a c — a 


and (x, y) 


hx 2 —n 2 2 nx 

V y ' y 


x 2 + n 2 


Example 9.21. The correspondence between rational right triangles with in- 
teger area and cubic curves allows us to generate infinitely many congruent 
triangles with the same area from a given such triangle. 
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For example, on page 18, we saw that there are two rational right triangles 
with area 5. One comes from a scaled copy of A(9, 40, 41 ) whose area is 5- 6 2 . 
To find the second one, it would take a very long time (even with a computer) 
to find A(2420640, 2307361, 3344161) whose area is 5 • 747348 2 But we can 
use an idea related to the “sweeping lines” method of Diophantus. The rational 
right triangle with side-lengths (|, corresponds, via the formulas in 

Corollary 9.20, to the point P = (=^-, on the curve C defined by 

y 2 = x 3 — 25x. 


The idea is to take a line tangent to C through P ; it intersects C in a second 
point P ' , which is also rational because P is (see Exercise 9.16 below). From 
this new rational point, we can build a new right triangle. 

From y 2 = x 3 — 25x, we have, using implicit differentiation that 

dy 3x 2 — 25 
dx 2 y 

Using this, we find that the slope of the tangent to C at P is 59/12, and hence 
the tangent line to C at P has equation 


A CAS (or at least a 
calculator) is a very useful 
tool for these calculations. 



+ 


75 

¥' 


Solving the system (see Figure 9.7) 


y 2 = x 3 — 25x 



+ 


75 

~ 8 ~’ 


we get P and 


P' = 


1681 62279 \ 
144 ' 1728 j ' 



Finally, using Corollary 9.20 again, we recover (a, b , c) from P'\ 

1519 , 4920 334416 

a = , b = , c = . 

492 1519 747348 

These are the side-lengths of the triangle on page 18. ▲ 





404 Chapter 9 Epilog 


This definition is not 
quite accurate, for an 
elliptic curve is really an 
equivalence class of such 
curves. 


Curves over C are two- 
dimensional surfaces when 
viewed over R. 


Exercises 

9.15 Show that there are no rational points ( x , y ) with y f 0 on the graph of 

(i) y 2 — -V 3 — x 

(ii) y 2 = x 3 — 4x 

9.16 Show that a cubic equation with rational coefficients and two rational roots has, 
in fact, three rational roots. 

9.17 Find a third rational right triangle with area 5. different from the two we found in 
Example 9.21. 

Elliptic Curves 

The curves defined by the equation in Theorem 9.19, 

V 2 = a cubic polynomial in x 

show up all across mathematics. We just saw how they can be used to generate 
congruent numbers. 

Before that, we saw that the integral defining arcsine, f clt / Vl — t 2 , sug- 
gested studying elliptic functions, the inverse functions of elliptic integrals. 
Just as the unit circle is parametrized by sine and cosine (it consists of the 
points (sin 9, cos 9)), Gauss, Abel, and Jacobi considered curves parametrized 
by elliptic functions; that is, curves consisting of the points {fill), f'(u )), 
where / is an elliptic function (cosine is the derivative of sine). What sort of 
curves are these? Expand the integrand of an elliptic integral as a power series 
(since it has a denominator, the series begins with a negative power), and then 
integrate term by term. There results a differential equation involving x = f 
and y = /', which turns out to be a cubic in two variables (see [9], pp. 17- 
19). After some manipulations, one obtains a Weierstrass normal form for the 
points (x, y) on the curve y 2 = ax 3 + bx 2 + cx + d (there is another, simpler, 
Weierstrass normal form, y 2 = 4x 3 — g 2 X — g 3 , where g 2 , g 3 are constants). 

Definition. An elliptic curve over a field k is a curve C C k 2 with equation 

y 2 = g(x), 

where g(x) = ax 3 + bx 2 + cx + d G k[x\ has no repeated roots. 

The most interesting elliptic curves are over C (for complex variables) or 
over <Q> (for number theory), while elliptic curves over finite fields F ? give rise 
to public access codes that are more secure than the RSA codes we discussed 
in Chapter 4. 

Elliptic functions and elliptic curves, whose humble origins are in arclength 
problems, occur in analysis, geometry, and complex variables. In the previous 
subsection, we saw that congruent numbers lead to rational points on elliptic 
curves. More generally, let’s now see the connection with number theory and 
with Fermat’s Last Theorem in particular. 

Definition. A Diophantine equation is an equation f(x \ , . . . , x m ) = 0, where 
f{x\, ... ,x m ) G Q|.\'i x m \ is a polynomial in several variables having ra- 

tional coefficients. 
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A solution to a Diophantine equation f{x \ , x m ) = 0 is an m-tuple 

(qi , . . . , q m ) G Q m for which f(qi , . . . , q m ) = 0; an integer solution is a 
solution in Z m . 


For example, x n + y" — 1 = 0 is a Diophantine equation. Rational solutions 
(w, i>) = ( a/c , b/c) give rise, by clearing denominators, to integer solutions of 
the Diophantine equation x n + y n = z" . Of course, this example arises from 
Fermat’s Last Theorem. 

A curve in the plane is the locus of solutions to an equation f(x, y ) = 0. 
Let’s focus on polynomials f(x, y) e Q[x, y]; that is, on Diophantine equa- 
tions of two variables. A rational point (m, v) on the curve / (x, y) = 0 is a 
geometric way of viewing a solution to the Diophantine equation /(x, y) = 0. 
The method of Diophantus classifies Pythagorean triples by intersecting the 
unit circle x 2 + y 2 = 1 with lines through the rational point (—1, 0), thereby 
parametrizing the circle with rational functions, and then finding the rational 
points on the circle that correspond to Pythagorean triples. We saw, in Chap- 
ter 1, that it is worthwhile to generalize the method by replacing the unit circle 
by conic sections. 

Now pass from conic sections, curves in the plane corresponding to quadratic 
polynomials / (x, y) e R[x, y], to cubic polynomials of two variables. For ex- 
ample, rational points on x 3 + y 3 = 1 correspond to integer solutions of 
a 3 + b 3 = c 3 , so that the truth of Fermat’s Last Theorem for n = 3 says that 
the curve x 3 + y 3 = 1 has no rational points. 

Diophantus also studied cubic curves. There was no analytic geometry in his 
day, and geometry was not explicit in his results. However, both Fermat and, 
later, Newton believed that geometry explains Diophantus’s method of finding 
solutions of cubic Diophantine equations /(x, y) =0; indeed, Newton called 
it the chord-tangent construction. Just as lines usually intersect a conic section 
in two points, lines usually intersect cubic curves in three points. If y = mx+h 
is the equation of a line L, then L meets the curve in points (x, mx + h) for 
which / (x, mx + h) = 0. But / (x, mx + h ) is a cubic, and so it has three roots 
(if we admit complex numbers). In particular, given rational points P, Q on a 
cubic curve C, say P = {a, b) and Q = (c, d), then the slope (d —b)/(c —a) 
of the line L they determine is rational. If C has equation / (x, y) = 0, where 
/(x, y) e Q[x, y], then L intersects C in a third point, which is also a rational 
point. Thus, the chord joining rational points P and Q on C determines a third 
rational point on C (see Figure 9.8); denote the third point by 

P * Q. 

If we think of the tangent line T to C at P as a limit of chords through P, 
then it is natural to consider where T meets C . The slope of T is also rational: 
if 


A(x)y 3 + B(x)y 2 + C(x)y + D(x) = 0, 


where A , B,C. D e <Q[x] have degrees, respectively, 0, 1, 2, 3, then implicit 
differentiation gives 


y'(x,y) = 


B'y 2 + C'y + D' 

3 Ay 2 + 2 By + C ’ 


since the coefficients of A. B. C. D are rational and P = (a, b) is a rational 
point, the slope y' (a. b) of T is rational. It follows easily that if T meets C, 




406 Chapter 9 Epilog 


y 



This is a good reason to 
consider cubics over the 
complex numbers. 


then the point of intersection is another rational point; denote such a point by 

P * P 

(thus, the tangent line T intersects C in another rational point, say Q , and the 
two points P. Q determine a third rational point). 

If we are considering cubic curves C in the plane, then it is possible that a 
line meets C in only one point, not three (a cubic in R [x] always has a real root, 
but its other roots may not be real). To make all work smoothly, we enlarge the 
plane to the Riemann sphere C = R 2 U {cxo}, where we regard oo as a point 
“at infinity.” We agree that lines through oo are precisely the vertical lines; we 
declare that oo lies on every cubic C and that oo is a rational point on C . Given 
two points P , Q on the curve, the line they determine meets the curve in a third 
rational point P * Q. Define P + Q to be the intersection of C with the vertical 
line V through P * Q; that is, V is the line determined by the two rational 
points oo and P * Q (see Figure 9.8). The wonderful discovery is that this 
allows us to “add” points P, Q on elliptic curves (indeed, the set of all rational 
points on an elliptic curve is an abelian group under this binary operation). In 
particular, if C is the elliptic curve arising from the lemniscate (or any of the 
elliptic functions considered by Euler), then the limits of integration in Euler’s 
Addition Theorem are given by the chord-tangent construction: for example, 

dt ^ 1“ ® dt j-P+Q di 

Jo Vl — f 4 Jo Vl — t 4 Jo Vl — t 4 

As we have seen, congruent numbers n arising from rational side-lengths 
{a, b,c) of a right triangle correspond to rational points on the elliptic curve 
y 2 = x 3 — n 2 x. The binary operation shows how to construct new congruent 
numbers from given ones. The importance of this operation is illustrated by 
Theorem 9.19. 

What has this discussion to do with Fermat’s Last Theorem? The abelian 
group of rational points on elliptic curves, an example of complex multipli- 
cation , is only the beginning of deep connections between Diophantine equa- 
tions and elliptic curves. The following account by the number theorist Andrew 
Granville summarizes the recent history. 
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It all began in 1955, with a question posed by the Japanese mathe- 
matician Yutaka Taniyama: Could one explain the properties of ellip- 
tic curves, equations of the form y 2 = x 3 + ax + b with a and b 
given whole numbers, in terms of a few well-chosen cur\>es? That is, 
is there some very special class of equations that in some way encap- 
sulate everything there is to know about our elliptic curvesl Taniyama 
was fairly specific about these very special cur\’es ( the so-called modu- 
lar curves) and, in 1968, Andre Weil, one of the leading mathematicians 
of the twentieth century, made explicit which modular cur\’e should de- 
scribe which elliptic curve. In 1971, the first significant proven evidence 
in favor of this abstract understanding of equations was given by Goro 
Shimura at Princeton University, who showed that it works for a very 
special class of equations. This somewhat esoteric proposed approach to 
understanding elliptic curves is now known as the Shimura— Taniyama— 
Weil conjecture. There the matter stood until 1986, when Gerhard Frey 
made the most surprising and innovative link between this very abstract 
conjecture and Fermat’s Last Theorem. What he realized was that if 
c n = a n + b n , then it seemed unlikely that one could understand the 
equation y 2 = x(x — a n )(x + b n ) in the way proposed by Taniyama. 
It took deep and difficult reasoning by Jean-Pierre Serre and Ken Ribet 
to strengthen Frey ’s original concept to the point that a counterexam- 
ple to Fermat’s Last Theorem would directly contradict the Shimura— 
Taniyama— Weil conjecture. 

This is the point where Wiles enters the picture. Wiles drew together a 
vast array of techniques to attack this question. Motivated by extraor- 
dinary new methods of Victor Kolyvagin and Barry Mazur, Wiles estab- 
lished the Shimura— Taniyama— Weil conjecture for an important class of 
examples, including those relevant to proving Fermat’s Last Theorem. 
His work can be viewed as a blend of arithmetic and geometry, and 
has its origins way back in Diophantus ’s Arithmetic. However he em- 
ploys the latest ideas from a score of different fields, from the theories 
of L-functions, group schemes, crystalline cohomology, Galois repre- 
sentations, modular forms, deformation theory, Gorenstein rings, Euler 
systems and many others. He uses, in an essential way, concepts due to 
many mathematicians from around the world who were thinking about 
very different questions. 

The work of Wiles is a tour de force, and will stand as one of the sci- 
entific achievements of the century. His work is not to be seen in iso- 
lation, but rather as the culmination of much recent thinking in many 
directions. Wiles’ proof, starting from scratch, would surely be over a 
thousand pages long. 

The story of this important discovery is a tribute to the deeper and more 
abstruse levels of abstract understanding that mathematicians have long 
claimed is essential. Many of us, while hailing Wiles ’ magnificent achieve- 
ment, yearn for Fermat to have been correct, and for the truly marvel- 
lous, and presumably comparatively straightforward, proof to be recov- 
ered. 


We refer the reader to 
[31] for more about elliptic 
curves and Diophantine 
equations. We also rec- 
ommend the expository 
article of Cox, Introduction 
to Fermat’s Last Theorem, 
Amer. Math. Monthly 101 
(1994), pp. 3-14, for more 
details. 
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A.1 Functions 

Pick up any calculus book; somewhere near the beginning is a definition of 
function which reads something like this: a function f is a rule that assigns to 
each element a in a set A exactly one element, denoted by f(a), in a set B . Ac- 
tually, this isn’t too bad. The spirit is right: / is dynamic; it is like a machine, 
whose input consists of elements of A and whose output consists of certain 
elements of B. The sets A and B may be made up of numbers, but they don’t 
have to be. 

There is a slight notational surprise. We are used to writing a function, not 
as /, but as / (x). For example, integrals are written 

J f(x)dx. 

Logically, one notation for a function, say /, and another for its value at a 
point a in A, say / (a), does make sense. However, some notation is grand- 
fathered in. For example, we will continue to write polynomials as /(x) = 
ci n x n + a n - ix"^ 1 + • • • + etc , trigonometric functions as sinx and cos x, and 
the exponential as e x (but some authors denote the exponential function by 
exp). Still, the simpler notation / is usually a good idea. 

One problem we have with the calculus definition of function involves the 
word rule. To see why this causes problems, we ask when two functions are 
equal. If / is the function f(x) = x 2 + 2x + 1 and g is the function g(x) = 
(x + l) 2 , is / = g? We usually think of a rule as a recipe, a set of directions. 
With this understanding, / and g are surely different: /( 5) = 25+10+1 and 
g( 5) = 6 2 . These are different recipes, but note that both cook the same dish: 
for example, /( 5) = 36 = g(5). 

A second problem with the calculus definition is just what is allowed to be 
a rule. Must a rule be a formula? If so, then / (x), defined by 

1 if x is rational 
0 if x is irrational, 

is not a function. Or is it? The simplest way to deal with these problems is to 
avoid the imprecise word rule. 

If A is a set, then we write 

a & A, 



which abbreviates “a belongs to A” or “a is an element of A.” 
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We can define ordered 
pair from scratch: see 
Exercise A. 2 on page 418. 
Cartesian product could 
then be defined by induc- 
tion on n. 


Definition. If A\ , A 2 , . . . , A n are sets, their cartesian product is 

^ix^x-'-xin = {(a!, c/ 2 , • • • , a n ) : a, G /!,■ for all i }. 

An element (a \,af) G A\ x ,4 2 is called an ordered pair , and (a 1 , 02 ) = 
(aj.flj) if an( i only if a 1 = a, and <72 = a' 2 . More generally, two 77 -tuples 
(a 1 , 02 , ■ ■ . , a n ) and (a\ , a ' 2 , . . . , a' n ) are e^7/a/ if a, = aj for all subscripts i . 

We now review subsets and equality, for functions / : A — > B are subsets 
of A x B . We say that U is a subset of V or U is contained in V, denoted by 

U c V, 

if, for all u g U , we have u G V . Formally: (Vm)(m e 1/ => 11 e F). 

Two subsets C7 and V of a set A are equal , that is, 

U = V, 

if they are comprised of exactly the same elements: thus, U = V if and only 
if {/ c V and V C U . This obvious remark is important because many proofs 
of equality of subsets break into two parts, each part showing that one subset 
is contained in the other. For example, let 

U = {x g 1 : x > 0} and V = {x e R : there exists tel with x = v 2 }. 

Now U C V because x = (y/x) 2 G V, while V c U because y 2 > 0 for 
every real number y (if y < 0, then y = —a for a > 0 and y 2 = a 2 ). Hence, 
U = V . 

If U is a proper subset of V ; that is, if U C V but U 7 ^ V, then we write 

U c V. 

An empty set, denoted by 0, is a set with no elements. Given a set X, it is 
always true that 0 C X. To see this, observe that the negation 

(3 u){u e 0 =>■ u £ V) 

is false, for there is no u e 0. The empty set is unique: if 0' is also an empty 
set, then 0 C 0' and 0' C 0, so that 0 = 0'. There is only one empty set. 

Informally, a function is what we usually call its graph. 

Definition. Let A and B be sets. A function f: A — > £ is a subset / C A x B 
such that, for each a G A, there is a unique b G B with (a, b ) G /. The set A 
is called its domain, and the set B is called its target. 

If / is a function and (a, b) G /, then we write f(a) = b and we call b 
the value of / at a. Define the image (or range ) of /, denoted by im /, to be 
the subset of B consisting of all the values of /. 

When we say that A is the domain of a function f: A — > B, we mean that 
/ (a ) is defined for every a G A. Thus, the reciprocal / (x) = \/{x — 1) is not 
a function R — ► K, but it is a function R' — > R, where R' denotes the set of all 
real numbers not equal to 1 . 

The second problem above — is /: R — > R a function, where f(x) = 1 if 
x is rational and f(x) = 0 if x is irrational — can now be resolved; yes, / is a 
function: 

/ = {(x, 1) : x is rational} U {(x, 0) : x is irrational} cRxR. 
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Let’s look at more examples before resolving the first problem arising from 

the imprecise term rule. 

Example A.l. (i) The squaring function /: R — ► R, given by / (x) = x 2 , 
is the parabola consisting of all points in the plane 1 x 1 of the form 
(a, a 2 ). It satisfies the definition, and so / is a function (if / wasn’t a 
bona fide function, we would change the definition!). 

(ii) If A and B are sets and ho e B . then the constant function at ho is 
the function / : A — > B defined by f(a) = bo for all a € A (when 
A = R = B. then the graph of a constant function is a horizontal line). 

(iii) For a set A, the identity function l a- A — > A is the function consisting of 
all {a. a) (the diagonal of A x A ), and 1 a(ci) = a for all a € A. 

(iv) The usual functions appearing in calculus are also functions according to 
the definition just given. For example, the domain of sin x is R, its target 
is usually R, and its image is the closed interval [—1, 1]. A 


How to Think About It. A function f: A — > B is the subset of A x B 
consisting of all the ordered pairs (a. f(a)) (this subset is the function but, 
informally, it is usually called its graph). In order to maintain the spirit of a 
function being dynamic, we often use the notation 

f \ a v+b 

instead of f(a) = b. For example, we may write the squaring function as 
/: a a 2 instead of f(a) = a 2 . We often say that / sends a to / (a). 


Let’s return to our first complaint about rules; when are two functions equal? 

Proposition A.2. Let / : A — > B and g: A — > B be functions. Then f = g if 
and only if f (a) = g(a) for every ael. 

Proof. Assume that / = g. Functions are subsets of A x B, and so f = g 
means that each of / and g is a subset of the other. If a e A, then (a. f (a)) e 
/ ; since / = g. we have (a, f (a)) e g. But there is only one ordered pair in 
g with first coordinate a, namely, (a. g(a)), because the definition of function 
says that g gives a unique value to a. Therefore, ( a , /(a)) = ( a , g(a )), and 
equality of ordered pairs gives / (a) = g(a), as desired. 

Conversely, assume that f(a) = g(a) for every a e A. To see that / = g, 
it suffices to show that / C g and g C / . Each element of / has the form 
(a, f(a)). Since f(a ) = g(a), we have ( a,f(a )) = (a,g(a)) and hence 
( a , /(a)) e g. Therefore, / C g. The reverse inclusion g C f is proved 
similarly. Therefore, / = g. ■ 

Proposition A. 2 resolves the first problem arising from the term rule : if 
/, g: R — > R are given by / (x) = x 2 + 2x + 1 and g(x) = (x + l) 2 , then 
f = g because f{a) = g(a) for every number a. 

Let us clarify another point. Can functions /: A — > B and g: A! — > B' be 
equal? Here is the commonly accepted usage. 
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Definition. Functions /: A — > B and g: A' — > B' are equal if A = A', 
B = B’ , and f (a) = g(a) for all a & A. 

Thus, a function /: A — > B has three ingredients — its domain A, its tar- 
get B , and its graph — and we are saying that functions are equal if and only if 
they have the same domains, the same targets, and the same graphs. 

It is plain that the domain and the graph are essential parts of a function; 
why should we care about the target of a function when its image is more 
important? As a practical matter, when first defining a function, one usually 
doesn’t know its image. For example, what’s the image of f: R — > R, defined 
by 


See the discussion on 
page 437 for a more 
sophisticated reason why 
targets are important. 


fix) = log 


i + i*r* ^ _ f x dt 9 
\J x 1 4- cos 2 x J Jo \/l + t 6 


We must analyze / to find its image, and this is no small task. But if targets 
have to be images, then we couldn’t even write down /: R — > R without 
having first found the image of /. Thus, targets are convenient to use. 

If A is a subset of a set B, the inclusion i: A — > B is the function given by 
i (a) = a for all a € A; that is, i is the subset of Ax B consisting of all (a, a) 
with a e A. If S is a proper subset of a set A, then the inclusion /: S — > A is 
not the identity function I s because its target is A, not S; it is not the identity 
function 1 a because its domain is S, not A. 

Instead of saying that the values of a function / are unique, we sometimes 
say that / is single-valued or that it is well-defined. For example, if R- de- 
notes the set of nonnegative reals, then *J\ R- — > R- is a function because 
we agree that -Ja > 0 for every nonnegative number a. On the other hand, 
g(a) = ± -fid is not single-valued, and hence it is not a function. The sim- 
plest way to verify whether an alleged function / is single-valued is to phrase 
uniqueness of values as an implication: 


iffl = a', then f(a ) = f(a r ). 


For example, consider the addition function x R — > R. To say that a 
is well-defined is to say that if (m, u) = (z/, v') in R x R, then a(u, v ) = 
a(u\ t/); that is, if u = u’ and v = v\ then u + v = u' + v'. This is usually 
called the Law of Substitution. 

Another example is addition of fractions. We define 

a c ad + be 

b d bd 

But fractions have many names. If a/b = a' /b' and c/d = c' / d' , is (ad + 
bc)/bd = (a' d' + b'c')/b'd'l We verified that this formula does not depend 
on the choices of names of the fractions on page 193. On the other hand, the 
operation 


a c a + c 

b d bd 

is not well-defined: \ = \, but \ * | = |, while | * | = ^ |. 

There is a name for functions whose image is equal to the whole target. 
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Definition. A function /: A — > B is surjective (or onto or a surjection) if 


im / = B. 


Thus, / is surjective if, for each b e B, there is some a e A (depending 
on b) withfc = f(a). 


Example A.3. (i) The identity function l a- A — > A is a surjection. 

(ii) The sine function R — > R is not surjective, for its image is [—1, 1], a 
proper subset of its target R. The function s:R — > [—1, 1], defined by 
s(x) = sin X, is surjective. 

(iii) The functions x 2 :R -> R and e x :R — > R have target R. Now im x 2 
consists of the nonnegative reals and im e x consists of the positive reals, 
so that neither x 2 nor e x is surjective. 

(iv) Let / : R — * R be defined by 


f(a ) = 6a + 4. 

To see whether / is a surjection, we ask whether every b G R has the 
form b = f (a) for some a; that is, given b, can we find a so that 

6a + 4 = bl 


Since a = j(b — 4), this equation can always be solved for a, and so / 
is a surjection. 

(v) Let /: R — R be defined by 


f(a ) = 


6 a + 4 
2 a — 3 


To see whether / is a surjection, we seek, given b, a solution a: can we 
solve 


b 


f(a ) = 


6 a + 4 ( 
2 a — 3 


This leads to the equation a (6 — 2b) = —3 b — 4, which can be solved 
for a if 6 — 2b ^ 0 (note that (—3b — 4) / (6 — 2b) ^ 3/2). On the other 
hand, it suggests that there is no solution when b = 3 and, indeed, there is 
not: if (6 a + 4) / (2 a — 3) = 3, cross multiplying gives the false equation 
6a + 4 = 6 a — 9. Thus, 3 ^ im f , and / is not a surjection (in fact, 
im / = R — {3}). ▲ 


The following definition gives another important property a function may 
have. 


Definition. A function /: A — > B is injective (or one-to-one or an injection) 
if, whenever a and a' are distinct elements of A , then f(a) ^ f(a'). Equiva- 
lently, the contrapositive states that / is injective if, for every pair a, a' e A, 
we have 
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f(a) = f(a') implies a = a'. 
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Being injective is the converse of being single- valued: / is single- valued if 
a = a ' implies / (a) = f ( a '); / is injective if f (a) = f (a') implies a = a' . 

Most functions are neither injective nor surjective. For example, the squar- 
ing function /: M — > M, defined by / (x) = x 2 , is neither. 



which simplifies to 26 a = 26 b and hence a = b. We conclude that / is 
injective. (We saw, in Example A. 3, that / is not surjective.) 

(iii) Consider /: R — > E, given by /(x) = x 2 — 2x — 3. If we try to check 
whether / is an injection by looking at the consequences of f(a ) = 
f(b), as in part (ii), we arrive at the equation a 2 — 2 a = b 2 — 2b; it 
is not instantly clear whether this forces a = b. Instead, we seek the 
roots of /, which are 3 and —1. It follows that / is not injective, for 
/(3) = 0 = /(— 1); that is, there are two distinct numbers having the 
same value. A 


Sometimes there is a way of combining two functions to form another func- 
tion, their composite. 

Definition. If / : A — > B and g: B C are functions (the target of / is the 
domain of g), then their composite, denoted hv g o /, is the function A C 
given by 

f-a i-» g(f(a )); 

that is, first evaluate / on a, and then evaluate g on /(a). 

We usually abbreviate the notation for composites in the text, writing gf 
instead of g o /, but we shall always write g o f in this Appendix. 

Composition is thus a two-step process: a m- f(a) m- g{f (a)). For exam- 
ple, the function h; R — > E, defined by h(x) = e cmx , is the composite go/, 
where /(x) = cos x and g(x) = e x . This factorization is plain as soon as one 
tries to evaluate, say h(jz); one must first evaluate f(jt) = cos7T = — 1 and 
then evaluate 

h(jr) = g(f(jr)) = g(-l) = e~ l . 

The chain rule in calculus is a formula that computes the derivative (go f)' 
in terms of g' and / ': 

(g o /)'(x) = (./ o / )<x) • f\x) = g'(/(x)) • f'(x). 
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If / : A — > B is a function, and if S is a subset of A, then the restriction 
of / to S is the function f\S:S — > B, defined by (/IS)^) = f(s) for all 
s G 5 . It is easy to see that if i : S — > A is the inclusion, then / 1 S = f o i ; that 
is, the functions / 1 S and / o i have the same domain, the same target, and the 
same graph (see Exercise A. 4 on page 419). 

If / : N -> N and g: N — »• R are functions, then go /: N — > R is defined, 
but / o g is not defined (for target(g) = R ^ N = domain(/)). Even when 
/ : A — > B and g: B — > A, so that both composites go f and fog are defined, 
they need not be equal. For example, define /, g: N — > N by /: n i ->■ n 2 and 
g: n 3 n: then go/:2 i-> g( 4) = 12 and f o g: 2 t-* f (6) = 36. Hence, 

g° f f f ° g- 

Given a set A, let 

A a = {all functions A — » A}. 

The composite of two functions in A a is always defined, and it is, again, a 
function in A a . As we have just seen, composition is not commutative ; that is, 
fog and go f need not be equal. Let us now show that composition is always 
associative. 

Proposition A.5. Composition of functions is associative', given functions f : 
A — > B, g: B — > C, and h: C -> D, then 

ho (go f) = (h Og)o f 

Proof. We show that the value of either composite on an element a e A is just 
d = h(g(f (a))). If a g A, then 

ho (go /): a ^ (go f)(a) = g(f(a )) i-» h(g(f(a ))) = d , 


and 


(h o g) f(a) i-» (h o g)(f(a)) = h(g(f(a ))) = d. 

Since both are functions A —> D, it follows from Proposition A. 2 that the 
composites are equal. ■ 

In light of this proposition, we need not write parentheses: the notation 
h ° g ° f is unambiguous. 

Suppose that /: A — > B and g.C^-D are functions. If B C C, then 
some authors define the composite Jr. A — > D by h(a) = g(f(ci )). We do 
not allow composition if B f C . However, we can define h as the composite 
h = g o i o f , where i : B — > C is the inclusion. 

The next result implies that the identity function 1 a behaves for composition 
in A A just as the number one does for multiplication of numbers. 

Proposition A.6. If f: A — > B, then 1 b ° f = / = / ° U- 

Proof. If a G A. then 


1 B of :a ^ f (a) ^ f(a ) 


and 


/ o l^:o hHiH- f(a). ■ 
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Are there “reciprocals” in A , that is, are there any functions / : A — > A 
for which there is g e A a with / o g = 1^ and g of = 1^? The following 
discussion will allow us to answer this question. 

Definition. A function /: A — > B is bijective (or a one-one correspondence 
or a bijection ) if it is both injective and surjective. 

Example A.7. (i) Identity functions are always bijections. 

(ii) Let X = {1,2,3} and define f:X — > X by 

/(l) = 2, / (2) = 3, /( 3) = 1. 

It is easy to see that / is a bijection. ▲ 

We can draw a picture of a function f:X — > Y in the special case when 
X and Y are finite sets (see Figure A.l). Let X = {1,2, 3, 4, 5}, let Y = 
{a, b, c, d, e}, and define /: X — > Y by 

/(l) = b f (2) = e /( 3) = a / (4) = * /(5) = c. 

Now / is not injective, because f{\) = b = /( 4), and / is not surjective, 



because there is no x & X with /(x) = d . Can we reverse the arrows to get 
a function g: Y —> X? There are two reasons why we can’t. First, there is no 
arrow going to d, and so g(d) is not defined. Second, what is g(b)l Is it 1 or 
is it 4? The first problem is that the domain of g is not all of Y, and it arises 
because / is not surjective; the second problem is that g is not single-valued, 
and it arises because / is not injective (this reflects the fact that being single- 
valued is the converse of being injective). Neither problem arises when / is a 
bijection. 

Definition. A function f'.X — > Y is invertible if there exists a function 
g: Y — > X, called its inverse , with both composites g o f and fog being 
identity functions. 

We do not say that every function / is invertible; on the contrary, we have 
just given two reasons why a function may not have an inverse. Notice that if 
an inverse function g does exist, then it “reverses the arrows” in Figure A.L If 
/ (a) = y, then there is an arrow from a to y. Now g o f being the identity 
says that a = (g o f)(a) = gif (a)) = g(y)\ therefore g: y i-> a, and so the 
picture of g is obtained from the picture of / by reversing arrows. If / twists 
something, then its inverse g untwists it. 
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Lemma A.8. If /: X — > Y and g: Y X are functions such that go f = lx, 
then f is injective and g is surjective. 

Proof Suppose that f{a) = f{a'). Apply g to obtain g(f(a)) = g(f(a')); 
that is, a = a' (because g(f (a)) = a), and so / is injective. If x G X, then 
x = gif (x)), so that x G im g; hence g is surjective. ■ 

Proposition A.9. A function /: X — >• Y has an inverse g'.Y — > X if and only 
if it is a bijection. 

Proof. If / has an inverse g, then Lemma A. 8 shows that / is injective and 
surjective, for both composites go f and fog are identities. 

Assume that / is a bijection. Let y G Y. Since / is surjective, there is some 
a G X with /(a) = y, since / is injective, a is unique. Defining g(y) = a 
thus gives a (single-valued) function whose domain is Y (g merely “reverses 
arrows;” since f(a) = y, there is an arrow from a to v, and the reversed arrow 
goes from y to a). It is plain that g is the inverse of / ; that is, / (g(y)) = 
f (a) = y for all y G Y and gif (a)) = g(y) = a for all x G X. ■ 

Example A.10. If a is a real number, then multiplication by a is the function 
Pa - M > R, defined by r i->- ar for all r G R. If a f 0, then p a is a bijection; 
its inverse function is division by a, namely, 8 a '. M — > R, defined by r m- i-r; 
of course, 8 a = P\/ a - If >' G IR, then 

Pa ° 8 a - r ^ = r; 

hence, p a o 8 a = 1r. Similarly, 8 a o p a = 1 R . 

If a = 0, however, then p a = po is the constant function po'.r i->- 0 for all 
r G 1, which has no inverse function. ▲ 


Etymology. The inverse of a bijection / is denoted by f^ 1 (Exercise A. 6 
on page 419 says that a function cannot have two inverses). This is the same 
notation used for inverse trigonometric functions in calculus; for example, 
sin^ 1 x = arcsinx satisfies sin(arcsin(x)) = x and arcsin(sin(x)) = x. Of 
course, sin -1 does not denote the reciprocal 1/ sinx, which is cscx. 


ExampleA.il. Here is an example of two functions /, g: N — > N with one 
composite the identity, but with the other composite not the identity; thus, / 
and g are not inverse functions. 

Define /, g: N — > N as follows: 

fin) = n+ 1; 

, . ( 0 if n = 0 

g{n) = , ^ , 

In— 1 if n > 1 . 

The composite go f = 1^, forg(/(/7)) = g(n+ 1) = n (because n + 1 > 1). 
On the other hand, / o g f lj^ because /(g(0)) = /( 0) =1^0. ▲ 


What are the domain and 
image Of arcsin? 


Two strategies are now available to determine whether a given function is 
a bijection: use the definitions of injective and surjective, or find an inverse 
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What are the domains and 
targets of log and exp? 


function. For example, if R > denotes the positive real numbers, let us show 
that the exponential function /: M — > R > , defined by / (x) = e x = ^ x n /n\, 
is a bijection. A direct proof that / is injective would require showing that if 
e a = e b , then a = b; a direct proof showing that / is surjective would involve 
showing that every positive real number c has the form e a for some a. It is 
simplest to prove these statements using the (natural) logarithm g(y) = logy. 
The usual formulas e ,ogy = y and log e x = x show that both composites fog 
and go/ are identities, and so / and g are inverse functions. Therefore, / is 
a bijection, for it has an inverse. 

The next theorem summarizes some results of this section. If A is a nonempty 
set, define the symmetric group on X : 

Sx = {bijections o:X — > X}. 

Theorem A.12. IfX is a nonempty set, then composition ( f g ) h>go / is a 
function Sx x Sx — > Sx satisfying the following properties: 

0) (/ o g) 0 h = f o (g o h) for all f g, h € S x \ 

(ii) there is \ x € Sx with \ x of = f = fo\ x for all f e S x : 

(iii) for all f e S x , there is f e S x with f'of = \ x = / o /'. 

Proof. Exercise A.12(iii) on page 420 says that the composite of two bijec- 
tions is itself a bijection, and so composition has target S x . Part (i) is Proposi- 
tion A. 5, part (ii) is Proposition A. 6, and part (iii) is Proposition A. 9. ■ 


Exercises 

A.l True or false, with reasons. 

(i) If 5 C T and T C A, then SCI. 

(ii) Any two functions f\X—*-Y and g:Y —*■ Z have a composite / o g: X — »• Z 

(iii) Any two functions f: X —*■ Y and g:Y — »• Z have a composite go f: X —> Z 

(iv) For every set X, we have Ax 0 = 0. 

(v) If f:X —*■ Y and j : i m / — > Y is the inclusion, then there is a surjection 
g- X -» im / with / = j o g. 

(vi) If /: A -> Y is a function for which there is a function g: Y —> X with 
/ o g = 1 y, then / is a bijection. 

(vii) The formula / (a /h) = (a + b)(a — b ) is a well-defined function Q — * Z. 

(viii) If /: N -*■ N is given by / (n) = n + 1 and g: N -> N is given by g(n) = 
n 2 , then the composite go f is n t-y n 2 (n + 1). 

(ix) Complex conjugation z = a + ib \-yJ = a — ib is a bijection C —*■ C. 

Hint, (i) True, (ii) False, (iii) True, (iv) True, (v) True, (vi) False, (vii) False, 
(viii) False, (ix) True. 

A.2 * Let A and B be sets, and let a e A and b e B. Define their ordered pair as 
follows: 

(a.b) = {a, {a, b}}. 

If a' e A and b' e B. prove that (a' ,b') = (a, b ) if and only if a' = a and 

b' = b. 
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Hint. In any formal treatment, one is obliged to define new terms carefully. In 
particular, in set theory, one must discuss the membership relation 6. Does x e x 
make sense? If it does, is it ever true? One of the axioms constraining e is that 
the statement a € x € a is always false. 

A.3 Let L = {(x, x) : x e R}; thus, L is the line in the plane that passes through the 
origin and makes an angle of 45° with the x-axis. 

(i) If P = (a,b) is a point in the plane with a ^ b, prove that L is the 
perpendicular-bisector of the segment PP' having endpoints P = ( a,b ) 
and P' = ( b,a ). 

Hint. You may use Lemma 3.16 and the fact that (j(a + c), b + d )) is 
the midpoint of the line segment having endpoints (a,b) and (c, d). 

(ii) If /: R —*■ R is a bijection whose graph consists of certain points (a,b) (of 
course, b = f(a)), prove that the graph of f~ l is 

{(b,a) : ( a,b ) e /}. 

A.4 * Let X and Y be sets, and let f: X -* Y be a function. 

(i) If S is a subset of X , prove that the restriction / 1 S is equal to the composite 
/ o i , where i : S — > X is the inclusion map. 

Hint. Use the definition of equality of functions on page 412. 

(ii) If im / = A C Y, prove that there exists a surjection /': X —*■ A with 
/ = j o /', where j : A — > Y is the inclusion. 

A.5 If f\X-¥Y has an inverse g, show that g is a bijection. 

Hint. Does g have an inverse? 

A.6 * Show that if /: X -> Y is a bijection, then it has exactly one inverse. 

A.7 Show that / : R -* R, defined by f(x) — 3.\‘ + 5, is injective and surjective, and 
find its inverse. 

A.8 Determine whether /: Q x Q — »• Q, given by 

f(a/b,c/d) = (a + c)/(b + d) 

is a function. 

Hint. It isn’t. 

A.9 Let X = {xi x m } and Y = {y \ , . . . , y n } be finite sets, where the x, are 

distinct and the yj are distinct. Show that there is a bijection f\X—*-Y if and 
only if | AT | = |Yj; that is, m = n. 

Hint. If / is a bijection, there are m distinct elements f(x i), . . . , f(x m ) in Y, 
and so m < n\ using the bijection / _1 in place of / gives the reverse inequality 
n < m. 

A.10 Suppose there are 11 pigeons, each sitting in some pigeonhole. If there are only 
10 pigeonholes, prove that there is a hole containing more than one pigeon. 

A.ll * (Pigeonhole Principle). If X and Y are finite sets with the same number 
of elements, show that the following conditions are equivalent for a function 

/ : -V — > : 

(i) / is injective (ii) / is bijective (iii) / is surjective. 

Hint. If A C X and \A\ = n = [A|, then A = X; after all, how many elements 
are in X but not in A? 
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A.12 * Let f:X-*-Y and g: Y -*■ Z be functions. 

(i) If both / and g are injective, prove that g o f is injective. 

(ii) If both / and g are surjective, prove that g o f is surjective. 

(iii) If both / and g are bijective, prove that g o f is bijective. 

(iv) If g o / is a bijection, prove that / is an injection and g is a surjection. 

A.13 (i) If f\(—Tt/2,n/2) —*■ R is defined by a tana, then / has an inverse 
function g\ indeed, g = arctan. 

Hint. Compute composites. 

(ii) Show that each of arcsin .r and arccos * is an inverse function (of sin .r and 
cos x, respectively) as defined in this section. (Domains and targets must be 
chosen with care.) 

A.2 Equivalence Relations 

When fractions are first discussed in grammar school, students are told that 
= | because 1x6 = 3x2; cross-multiplying makes it so! Don’t believe your 
eyes that 1^2 and 3^6. Doesn’t everyone see that Ix6=6 = 3x2?0f 
course, a good teacher wouldn’t just say this. Further explanation is required, 
and here it is. We begin with the general notion of relation. 

Definition. Let X and Y be sets. A relation from X to Y is a subset R of 
X x Y (if X = Y, then we say that R is a relation on X). We usually write 
xRy instead of (x, y) e R. 

Here is a concrete example. Certainly < should be a relation on M; to see 
that it is, define the subset 

R = {(x, y) € R x R : (x, y) lies on or above the line y = x}. 

You should check that (x, y) e R if the second coordinate is bigger than the 
first. Thus, xRy here coincides with the usual meaning x < y. 

Example A.13. (i) Every function /: X — > Y is a relation from Y to Y . 

(ii) Equality is a relation on any set Y; it is the diagonal 

{(x, x) : x e Y} c Y x Y. 

(iii) For every natural number m, congruence mod m is a relation on Z. Can 
you describe the subset of Z x Z? 

(iv) If Y = {(a.b) e Z x Z : i / 0), then cross multiplication defines a 
relation = on Y by 

(a, b) = ( c,d ) if ad = be. k 

Definition. A relation x = y on a set Y is 

(i) reflexive if x = x for all x € Y, 

(ii) symmetric if x = y implies y = x for all x, y e Y, 

(iii) transitive if x = y and y = z imply x = z for all x, y, z e Y. 

An equivalence relation on a set Y is a relation on Y that has all three prop- 
erties: reflexivity, symmetry, and transitivity. 
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Example A.14. (i) Ordinary equality is an equivalence relation on any set. 

(ii) If m > 0, then Proposition 4.3 says that x = y mod m is an equivalence 
relation on Z. 

(iii) If / is an ideal in a commutative ring R, then Proposition 7.1 shows that 
congruence mod I is an equivalence relation on R. 

(iv) We claim that cross multiplication is an equivalence relation on X = 
{( a,b ) € Z x Z : b ^ 0}. Verification of reflexivity and symmetry is 
easy. For transitivity, assume that (a, b) = (c, d) and (c, d) = ( e , /). 
Now ad = be gives ad f = be /, and cf = de gives bef = bde; thus, 
adf = bde. We may cancel the nonzero integer d to get af = be; that 
is, (a,b) = (e, /). 

(v) In calculus, equivalence relations are implicit in the discussion of vectors. 

An arrow from a point P to a point Q can be denoted by the ordered pair 
( P, Q ); call P its foot and Q its head. An equivalence relation on arrows 
can be defined by saying that (P. Q) = ( P' , Q') if the arrows have the 
same length and the same direction. More precisely, ( P , Q ) = (P' , Q') 
if the quadrilateral obtained by joining P to P' and Q to Q' is a paral- 
lelogram (this definition is incomplete, for one must also relate collinear 
arrows as well as “degenerate” arrows ( P , P )). The direction of an arrow 
from P to Q is important; if P ^ Q, then ( P , Q ) ^ ( Q , P). ▲ 

An equivalence relation on a set X yields a family of subsets of X. 

Definition. Let = be an equivalence relation on a nonempty set A. If a e X, 

the equivalence class of a, denoted by [a], is defined by 

[a] = {x e X ; x = a} c X. 

We now display the equivalence classes arising from the equivalence rela- 
tions in Example A.14. 

Example A. 15. (i) Let = be equality on a set X . If a e X, then [«] = {a}, 

the subset having only one element, namely, a. After all, if x = a, then x 
and a are equal! 

(ii) Consider the relation of congruence mod m on Z, and let a e Z. The 
congruence class of a , defined by 

{x e Z : x = a + km where k € Z}, 
is equal to the equivalence class of a, namely 

[a\ = {x € Z : x = a mod m}. 

(iii) If / is an ideal in a commutative ring R , then the equivalence class of an 
element a € R is the coset a + I . 

(iv) The equivalence class of (a, b) under cross multiplication, where a, b € Z 
and b ^ 0, is 

[(a, b)\ = {(c, d) : ad = be}. 

If we denote [(a, b)\ by a / b. then the equivalence class is precisely the 
fraction usually denoted by a/b. After all, it is plain that (1, 3) ^ (2, 6), 
but [(1, 3)] = [(2, 6)] because 1 x 6 = 3 x 2; that is, 1/3 = 2/6. 
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(v) An equivalence class [( P . Q)\ of arrows, as in Example A. 14, is called a 
vector, we denote it by [(P, Q)\ = PQ. k. 

It is instructive to compare rational numbers and vectors, for both are de- 
fined as equivalence classes. Every rational a/b has a “favorite” name — its ex- 
pression in lowest terms; every vector has a favorite name — an arrow ( O , Q) 
with its foot at the origin O. Although it is good to have familiar favorites, 
working with fractions in lowest terms is not always convenient; for example, 
even if both a/b and c/d are in lowest terms, their sum (ad + bc)/bd and 
product ac /bd may not be. Similarly, it is not always best to think of vectors 
as arrows with foot at the origin. Vector addition is defined by the parallelo- 
gram law (see Figure A. 2): OP + OQ = OR, where O, P , Q, and R are the 
vertices of a parallelogram. But OQ = PR, because ( O , Q ) = ( P , R), and it 
is more natural to write OP + OQ = OP + PR = OR. 



The next lemma says that we can replace equivalence by honest equality at 
the cost of replacing elements by their equivalence classes. 

Lemma A.16. If = is an equivalence relation on a set X, then x = y if and 
only if [x] = [y]. 

Proof Assume that x = y. If z e [x], then z = x, and so transitivity gives 
z = y; hence [x] C [y]. By symmetry, y = x, and this gives the reverse 
inclusion [y] C [x]. Thus, [x] = [y]. 

Conversely, if [x] = [y], then x e [x], by reflexivity, and so x e [x] = [y]. 
Therefore, x = y. ■ 

Here is a set-theoretic idea, partitions, that we’ll see is intimately involved 
with equivalence relations. 

Definition. Subsets A and B of a set X are disjoint if A fl B = 0; that is, 
no x € X lies in both A and B . A family V of subsets of a set X is called 
pairwise disjoint if, for all A, B & V, either A = B or ,4 C\ B = 0. 

A partition of a set A is a family V of nonempty pairwise disjoint subsets, 
called blocks, whose union is X. 

We are now going to prove that equivalence relations and partitions are 
merely different ways of viewing the same thing. 
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Proposition A.17. If = is an equivalence relation on a nonempty set X, then 
the equivalence classes form a partition of X. Conversely, given a partition V 
of X, there is an equivalence relation on X whose equivalence classes are the 
blocks in V. 

Proof Assume that an equivalence relation = on X is given. Each x e X lies 
in the equivalence class [x] because = is reflexive; it follows that the equiva- 
lence classes are nonempty subsets whose union is A. To prove pairwise dis- 
jointness, assume that a e [x] fl [y], so that a = x and a = y. By symmetry, 
x = a, and so transitivity gives x = y. Therefore, [x\ = [y], by Lemma A. 16, 
and so the equivalence classes form a partition of X. 

Conversely, let V be a partition of X. If x, y € X, define x = y if there is 
A € V with x € A and y e A. It is plain that = is reflexive and symmetric. 
To see that = is transitive, assume that x = y and y = z; that is, there are 
A, B € V with x, y e A and y, z e B. Since y e A fl B, pairwise disjointness 
gives A = B and so x, z e A; that is, x = z. We have shown that = is an 
equivalence relation. 

It remains to show that the equivalence classes are the blocks in V. If x e X, 
then x € A for some A e V. By the definition of =, if y e A, then y = x 
and y e [x]; hence, A C [x]. For the reverse inclusion, let z e [x], so that 
z = x. There is some B with x e B and z e /); thus, x € A fl B. By pairwise 
disjointness, A = B, so that z e A, and [x] C A. Hence, [x] = A. ■ 

Corollary A.18. //= is an equivalence relation on a set X and a, b e X, then 
[fl] fl [b] 7 ^ 0 implies [fl] = [b\. 

Example A.19. (i) If = is the identity relation on a set X , then the blocks 

are the one-point subsets of X. 

(ii) Let X = [0, 27r], and define the partition of X whose blocks are {0, 2: r} 
and the singletons {x}, where 0 < x < 2 n. This partition identifies the 
endpoints of the interval (and nothing else), and so we may regard this as 
a construction of a circle. ▲ 

Exercises 

A.14 Let X = {rock, paper, scissors}. Recall the game whose rules are: paper domi- 
nates rock, rock dominates scissors, and scissors dominates paper. Draw a subset 
of A x A showing that domination is a relation on A . 

A.15 Which of the following relations are equivalence relations? State your reasons. 

(i) The relation < on R. 

(ii) The relation R on Z given by m R n if m — n is odd. 

(iii) The relation R on Z given by m R n if m — n is even. 

(iv) The relation on a group of people of having a common friend. 

A.16 Let /: A —*■ Y be a function. Define a relation on A by x = x' if / (x) = / ix'). 
Prove that = is an equivalence relation. If x e A and fix) = y , the equivalence 
class [x] is denoted by / _1 (y); it is called the fiber over y. 

A.17 (i) Find the error in the following argument that claims to prove that a symmet- 
ric and transitive relation R on a set A must be reflexive; that is, R is an 
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equivalence relation on X. If x e X and xRy, then symmetry gives yRx 
and transitivity gives xRx. 

Hint. What is y? 

(ii) Give an example of a symmetric and transitive relation on the closed unit 
interval X = [0, 1] that is not reflexive. 

A.3 Vector Spaces 

Linear algebra is the study of vector spaces and their homomorphisms ( namely, 
linear transformations), with applications to systems of linear equations. We 
assume that most readers have had a course involving matrices with real en- 
tries. Such courses deal mainly with computational aspects of the subject, such 
as Gaussian elimination, finding inverses, and determinants, but here we do not 
emphasize this important aspect of linear algebra. Instead, we focus on vector 
spaces with only a few words about linear transformations. 

Introductory linear algebra courses begin with vector spaces whose scalars 
are real numbers but, toward the end of the course, scalars are allowed to 
be complex numbers. The instructor usually says that the results about vec- 
tor spaces over E hold, more generally, for vector spaces over C . This hand- 
waving bothers most students. We are now going to generalize the definition 
of vector space so that scalars may belong to any field k, and we will prove 
that the usual theorems about vector spaces over E do, in fact, hold not only 
for vector spaces over C but for vector spaces over k. In particular, they hold 
for vector spaces over Q or over F 9 . 

The first definitions do not change when we allow more general scalars. 

Definition. If k is a field, then a vector space over k is a set V equipped with 
addition V x V — > V, denoted by (u, v) !->• u + v, that satisfies 

(i) (u + v) + w = u + (v + w) for all m, v, w e V, 

(ii) there is 0 e V with 0 + v = v for all v € V, 

(iii) for each v € V, there is — v G V with — v + v = 0, 

(iv) u + v = v + u for all u, v G V; 

and scalar multiplication k x V — > V, denoted by (a. v) \ -v a v, that satisfies, 
for all a, b, 1 € k and all u, v € V, 

(i) a{u + v) = au + av, 

(ii) (a + b)v = av + bv, 

(iii) (ab)v = a(bv), 

(iv) 1 v = v. 

The elements of V are called vectors and the elements of k are called 
scalars. It is not difficult to prove that the vector — v in the third axiom of 
addition is equal to the scalar product (— 1 )v. 


Etymology. The word vector comes from the Latin word meaning “to carry;” 
vectors in Euclidean space carry the data of length and direction. The word 
scalar comes from regarding v i->- av as a change of scale. The terms scale 
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and scalar come from the Latin word meaning “ladder,” for the rungs of a 
ladder are evenly spaced. 


Example A.20. (i) Euclidean space V = R” is a vector space over R. Vec- 

tors are n-tuples {a\, ... , a n ), where a, e R for all i. Picture a vector v 
as an arrow from the origin to the point having coordinates (ai, . . . ,a n ). 
Addition is given by 

(ai,...,a n )+ (bi,...,b n ) = (ai + bi,...,a n + b„); 

geometrically, the sum of two vectors is described by the parallelogram 
law (see Figure A. 2 on page 422). 

Scalar multiplication is given by 

a v = a(a\, . . . ,a n ) = ( aa \ , . . . ,aa n ). 

Scalar multiplication v h-> a v “stretches” v by a factor |o|, reversing its 
direction when a is negative (we put quotes around stretches because av 
is shorter than v when |a| < 1). 

(ii) The example in part (i) can be generalized. If k is any field, define V = 
k n , the set of all n x 1 column vectors v = (a\, . . . ,a n ), where a, e k 
for all i . Addition is given by 

(ai,.. . ,a„) + (bi,.. . ,b n ) = (ai + bi,...,a n + b n ), 

and scalar multiplication is given by 

av = a(ai, . . . , a„) = (aai, . . . ,aa n ). 

(iii) The polynomial ring R = k[x\, where k is a field, is another example of 
a vector space over k. Vectors are polynomials /, scalars are elements 
a € k, and scalar multiplication gives the polynomial a / ; that is, if 

/ = b n x n -\ h b\x + b 0 , 

then 

af = ab n x n + 1- ab\x + abo- 

Thus, the polynomial ring k[x\ is a vector space over k. 

(iv) Let R be a commutative ring and let k be a subring; if k is a field, then 
R can be viewed as a vector space over k . Regard the elements of R as 
vectors and the elements of k as scalars; define scalar multiplication av, 
where a G k and v e R, to be the given product of two elements in R. 
The axioms in the definition of vector space are just particular cases of 
axioms holding in the commutative ring R. For example, if a field k is a 
sub field of a larger field E, then £ is a vector space over k; in particular, 
C is a vector space over R, and it is also a vector space over Q. 

(v) The set C[0, 1] of all continuous real-valued functions on the closed in- 
terval [0, 1] is a vector space over R with the usual operations: if /, g e 
C[0, 1] and cel, then 

f + g: a i-» f(a) + g(a) 
cf'.a i— >■ cf(a). ▲ 

Informally, a subspace of a vector space V is a nonempty subset of V that 
is closed under addition and scalar multiplication in V. 
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Definition. If V is a vector space over a field k, then a subspace of V is a 
subset U of V such that 

(i) 0 € U, 

(ii) u, u' G U imply u + u' G U , 

(iii) u G U and a G k imply au G U . 

Every subspace U of a vector space V is itself a vector space. For example, 
since u\ + (u 2 + u 3) = (u 1 + u 2) + M3 holds for all vectors u\, M2, M3 € V, 
it holds, in particular, for all vectors u 1, M2, u 3 G t/. 


Example A.21. (i) The extreme cases U = V and U = {0} (where {0} de- 

notes the subset consisting of the zero vector alone) are always subspaces 
of a vector space. A subspace U c V with U ^ E is called a proper sub- 
space of V; we may write U 5 V to denote U being a proper subspace 
of V. 


(ii) If v = (a 1 , . . . , a n ) is a nonzero vector in R", then the line through the 
origin. 


l = {a v : a e R}, 


is a subspace of R" . 

Similarly, a plane through the origin consists of all vectors of the form 
a i>i + b i)2, where i>i, i>2 is a fixed pair of noncollinear vectors, and a, b 
vary over R. It is easy to check that planes through the origin are sub- 
spaces of R" . 

(iii) If k is a field, then a homogeneous linear system over k of m equations 
in n unknowns is a set of equations 


a\\X\ -\ 1- a\ n x n = 0 

«21*1 H 1- a 2n x n = 0 

a m \X\ + • • • + a mn x n = 0, 

where aji G k. A solution of this system is an n x 1 column vector c = 
(ci, . . . , c„) e k n , where a ji c i = 0 for all j ; a solution (ci , . . . , c„) is 
nontrivial if some c, 7^ 0. The set of all solutions forms a subspace of k n , 
called the solution space (or nullspace) of the system. Using matrices, we 
can say this more succinctly: if A = [a,y] is the m x n coefficient matrix, 
then the linear system is Ax = 0 and a solution is an n x 1 column vector c 
for which Ac = 0. 

In particular, we can solve systems of linear equations over F p , where 
p is a prime. This says that we can treat a not necessarily homogeneous 
system of congruences mod p just as one treats an ordinary system of 
equations. 

For example, the system of congruences 

3x — 2y + z = 1 mod 7 
x + y — 2z = 0 mod 7 
—x + 2y + z = 4 mod 7 
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can be regarded as a system of equations over the field F7. The system 
can be solved just as in high school, for inverses mod 7 are now known: 

[ 2 ] [ 4 ] = [ 1 ]; [ 3 ] [ 5 ] = [ 1 ]; [6] [6] = [ 1 ], The solutionis 

(x,y,z) = ([ 5 ], [ 4 ], [ 1 ]). k 

Bases and Dimension 

The key observation in getting the “right” definition of dimension is to under- 
stand why M 3 is 3 -dimensional. Every vector (x, y, z) is a linear combination 
of the three vectors e\ = (1, 0, 0), e2 = (0, 1, 0), and e 3 = (0, 0, 1); that is, 

(x, y, z) = xe\ + ye 2 + ze 3. 

It is not so important that every vector is a linear combination of these specific 
vectors; what is important is that there are three of them, for it turns out that 
three is the smallest number of vectors with this property; that is, one cannot 
find two vectors u = (a, b. c ) and u' = (a', b' , c') with every vector a linear 
combination of u and 11' . 

Definition. A list in a vector space V is an ordered set A = tq, . . . , v n of 
vectors in V . 

More precisely, we are saying that there is some n > 1 and a function 
<p:{l, 2 ,...,n} -* V, 

with <p(i ) = Vj for all i. Thus, the subset i m <p is ordered in the sense that 
there is a first vector tq, a second vector tq, and so forth. A vector may appear 
several times on a list; that is, <p need not be injective. 

We often write linear 
combination instead of 
k-linear combination if it 
is clear where the scalar 
coefficients live. 

where a, G k for all i . 

Definition. If A = tq , . . . , v m is a list in a vector space V, then 
Span (A) = (tq, . . . , v m ) , 

the set of all the A: -linear combinations of tq, . . . , t> m , is called the subspace 
spanned by X. We also say that v\,...,v m spans Span(iq , . . . , v m ). 

It is easy to check that Span (tq , . . . , v m ) is, indeed, a subspace. 

Lemma A. 22 . Let V be a vector space over afield k. 

(i) Every intersection of subspaces of V is itself a subspace. 

(ii) If X = tq, .. . , v m is a list in V , then the intersection of all the sub- 
spaces of V containing X is Span (tq, . . . , v m ), the subspace spanned 
by v 1 .... , v m , and so Span (tq , . . . , v m ) is the smallest subspace of V 
containing X. 


Definition. Let V be a vector space over a field k. A k-linear combination of 
a list V\, ... ,v n mV is a vector v of the form 

t> = uqiq -I 1 - a n v n , 
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Proof. Part (i) is routine. For part (ii), let X = { v \ , . . . , v m } and let S denote 
the family of all the subspaces of V containing X ; we claim that 

P) S = Span (m, . . . , v m ) . 

SeS 

The inclusion C is clear, because S = Span (ui, . . . , v m ) e S. For the reverse 
inclusion, note that if S e S. then S contains v\, . . . , v m . and so it contains 
the set of all k - linear combination of Vi, . . . , v m , namely. Span (i>i, . . . , v m ). 


The next observation is important. 

Corollary A.23. The subspace spanned by a list X = Vi, . . . ,v m does not 
depend on the ordering of the vectors , but only on the set of vectors themselves. 

Proof. This follows from part (ii) of Lemma A. 22. ■ 

See Exercise A. 24 on page 440 to see other properties of a list that do not 
depend on the ordering of its vectors. 

If A = 0, then Span (X) = Pises where S is the family of all the 
subspaces of V containing X. Now {0} C Span (0) = Pises f° r {0} is 
contained in every subspace S of V. For the reverse inclusion, one of the sub- 
spaces S of V occurring in the intersection is {0} itself, and so Span (0) = 
P| 5 . CF S c {0}. Therefore, Span (0) = {0}. 

Example A.24. (i) Let V = R 2 , let e\ = (L0), and let ei = (0, 1). Now 

V = Span (ei, ef), for if v = (a, b) e V , then 

v = (a, 0) + (0, b) 

= a(l,0) + b( 0, 1) 

= ae i + be 2 G Span (e\ , ef) ■ 

(ii) If k is a field and V = k n , define e, as the 77 x 1 column vector having 1 in 
the 7 th coordinate and 0s elsewhere. The reader may adapt the argument 
in part (i) to show that e\, ... ,e n spans k" . 

(iii) A vector space V need not be spanned by a finite list. For example, let 

V = k[x\, and suppose that X = f\{x), . . . , / m (x) is a finite list in V . 

If d is the largest degree of any of the f , then every (nonzero) /t-linear 
combination of f\, .... f m has degree at most d . Thus, x d+1 is not a 
/t-linear combination of vectors in X, and so X does not span k[x], ▲ 

The following definition makes sense even though we have not yet defined 
dimension. 

Definition. A vector space V is called finite-dimensional if it is spanned by a 
finite list; otherwise, V is called infinite-dimensional. 

Part (ii) of Example A.24 shows that k n is finite-dimensional, while part (iii) 
shows that k [x] is infinite-dimensional. Now C is a vector space over R, and it 
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is finite-dimensional (C is spanned by 1, /); by Example A.20(iii), both R and 
C are vector spaces over <Q> (each can be shown to be infinite-dimensional). 

If a subspace U of a vector space V is finite-dimensional, then there is a 
list Vi,V2, ■ ■ ■ ,v m that spans U . But there are many such lists: if u is a vector 
in U , then the extended list v 1,1)2 , ,v m ,u also spans U . Let us, therefore, 
seek a shortest list that spans U . 

Notation. If tq , . . . , v m is a list, then tq , .... t?, ... , v m is the shorter list with 
vt deleted. 

Proposition A.25. If V is a vector space over a field k, then the following 
conditions on a list v\, . . . ,v m spanning V are equivalent: 

(i) V\, . . . ,v m is not a shortest spanning list 

(ii) some Vj is in the subspace spanned by the others 

(iii) there are scalars a\ , . . . , a m , not all zero , with 

m 

X = °- 

1=1 


Proof, (i) => (ii). If v\, .. . ,v m is not a shortest spanning list, then one of 
the vectors, say Vi, can be thrown out, and the shorter list still spans. Hence, 
Vi e Span (vi, . . . , vi, . . . , v m ). 

(ii) =>• (iii). If Vi = c j v j ’ define a i =—1^0 and aj = Cj for all 

j ± i- 

(iii) => (i). The given equation implies that one term, say a, t;, , is nonzero. 
Since k is a field, af 1 exists, and 

Vj =( - a~ l )^2 a jVj. (A.l) 

Deleting 1 >, gives a shorter list that still spans V : write any v e V as a lin- 
ear combination of all the vj (including n,); now substitute the expression 
Eq. (A.l) for Vj and collect terms. ■ 

We now give a name to lists described in Proposition A.25. 

Definition. A list X = tq, . . . , v m in a vector space V is linearly dependent 
if there are scalars a \ , . . . , a m , not all zero, with Y^t=\ a l v l = 0; otherwise, 
X is called linearly independent. 

The empty set 0 is defined to be linearly independent (we interpret 0 as a 
list of length 0). 

Example A.26. (i) A list X = vi, ... ,v m containing the zero vector is lin- 

early dependent: if vj = 0, then JT a, i>; = 0, where aj = 1 and a, = 0 
for i ± j . 

(ii) A list vi of length 1 is linearly dependent if and only if iq =0; hence, a 
list i>i of length 1 is linearly independent if and only if iq f 0. 




430 Appendix A Appendices 


(iii) A list Vi,V2 is linearly dependent if and only if one of the vectors is 
a scalar multiple of the other: if a \Vi + CI2V2 = 0 and a\ 7^ 0, then 
Vi = —(ci2/cii)v2- Conversely, if V2 = c Vi, then cvi — V2 = 0 and the 
list i>i, t’2 is linearly dependent (for the coefficient —1 of t>2 is nonzero). 

(iv) If there is a repetition in the list v\, . . ,,v m (that is, if v,- = Vj for some 
i f j), then Vi, ... ,v m is linearly dependent: define c, = 1 , Cj = — 1, 
and all other c = 0 . Therefore, if ui , . . . , v m is linearly independent, then 
all the vectors v, are distinct. A 

Linear independence has been defined indirectly, as not being linearly de- 
pendent. Because of the importance of linear independence, let us define it 
directly. 

Definition. A list v\, . ,.,v m is linearly independent if, whenever a A' -linear 
combination Y 17 =\ a t v l ~ 0. then every a,- = 0. 

It follows that every sublist of a linearly independent list is itself linearly 
independent (this is one reason for decreeing that 0 be linearly independent). 

Corollary A. 27 . If X = vi, ... ,v m is a list spanning a vector space V, then 
X is a shortest spanning list if and only if X is linearly independent. 

Proof. These are just the contrapositives of (i) => (iii) and (iii) => (i) in Propo- 
sition A. 25 . ■ 

We have arrived at the notion we have been seeking. 

Definition. A basis of a vector space V is a linearly independent list that 
spans V. 

Thus, bases are shortest spanning lists. Of course, all the vectors in a linearly 
independent list Vi, . . . , v n are distinct, by Example A. 26 (iv). 

Example A. 28 . In Example A. 24 (ii), we saw that X = e\, ... ,e n spans k" , 
where e; is the n x 1 column vector having 1 in the zth coordinate and 0s 
elsewhere. We now show that X is linearly independent. If 0 = c ; e/, then 

c\e\ = (ci, 0, 0 0) 

T £2^2 = (O’ ^2. 0 , . . . , 0 ) 

+ c n e n = + (0, 0, 0, . . . , c n ) 

0 = (ci,c 2 , ...,c„). 

Hence, c, = 0 for all i, X is linearly independent, and A is a basis; it is called 
the standard basis of k n . ▲ 

Proposition A. 29 . A list X = vi, ... ,v n ina vector space V over a field k is 
a basis of V if and only if each v e V has a unique expression as a k-linear 
combination of the vectors in X. 
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Proof. Since X is a basis, it spans V, and so each vector v € V is a k - linear 
combination: v = ffcijVj. If also v = JfbjV,-. then Jf( a i ~ b,)vi = 0, and 
linear independence gives a, = b\ for all i ; that is, the expression is unique. 

Conversely, existence of an expression shows that the list X spans V . More- 
over, if 0 = a v, with not all c; = 0, then the vector 0 does not have a 
unique expression as a linear combination of the vp. a second expression is 
0 = J2 cij v, with all cij =0. ■ 

Definition. If X = Vi , . . . , v n is a basis of a vector space V and v e V, then 
Proposition A. 29 says that there are unique scalars a\, ... ,a n with 

n 

V = ^2 Cli Vi . 
i = 1 

The n-tuple (a i , . . . , a n ) is called the coordinate list of a vector v € V relative 
to the basis X. 

If vi,...,v n is the standard basis of V = k n . then this coordinate list 
coincides with the usual coordinate list. 


How to Think About It. If Vi, ... ,v n is a basis of a vector space V over a 
field k. then each vector v G V has a unique expression 


v = a \V\ + 02 V 2 + ■ ■ ■ + a n v n , 

where a, G k for all i. Since there is a first vector iq, a second vector V 2 , 
and so forth, the coefficients in this /c-l inear combination determine a unique 
/(-tuple (a i , ci 2 , . . . ,a n ). Were a basis merely a subset of V and not a list (i.e., 
an ordered subset), then there would be n ! coordinate lists for every vector. But 
see Exercise A.24(iv) on page 440. 


We are going to define the dimension of a vector space V to be the number of 
vectors in a basis. Two questions arise at once. 

(i) Does every vector space have a basis? 

(ii) Do all bases of a vector space have the same number of elements? 

The first question is easy to answer; the second needs some thought. 

Theorem A.30. Every finite-dimensional vector space V has a basis. 

Proof. A finite spanning list X exists, since V is finite-dimensional. If X is 
linearly independent, it is a basis; if not, Proposition A. 25 says that we can 
throw out some element from X, leaving a shorter spanning list, say X' . If 
X' is linearly independent, it is a basis; if not, we can throw out an element 
from X' leaving a shorter spanning sublist. Eventually, we arrive at a shortest 
spanning list, which is linearly independent, by Corollary A. 27 and hence it is 
a basis. ■ 

The definitions of spanning and linear independence can be extended to 
infinite lists in a vector space, and we can then prove that infinite-dimensional 
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vector spaces also have bases. For example, it turns out that a basis of k[x\ is 
l,x,x 2 ,...,x n ,.... 

We can now prove invariance of dimension, perhaps the most important 
result about vector spaces. 

Lemma A.31. Let u\, . . . , u n , Vi, . . . , v m be elements in a vector space V 
with V\, . .. ,v m G Span(ui, . . . , u n ). If m > n, then v\, . . . ,v m is linearly 
dependent. 

Proof. The proof is by induction on n > 1 . 

Base Step. If n = 1, there are at least two vectors Vi,V 2 , and v\ = a\U\ and 
V 2 = ci 2 U\. If Mi = 0, then ui = 0 and the list of v’s is linearly dependent 
(by Example A.26(i)). Suppose U \ 0. We may assume that v\ 0, or we 

are done; hence, a i 0. Therefore, iq, V 2 is linearly dependent, for 1 • V 2 — 
ci 2 Cif l iq = 0 and hence the larger list tq , . . . , v m is linearly dependent. 

Inductive Step. There are equations 


Write out the proof in the 
special case m = 3 and 
n = 2. 


Vi = anUi -\ h ClinUn 

for i = l, ... ,m. We may assume that some an ^ 0, otherwise v\, ... ,v m e 
(m 2 , . . . , u n ), and the inductive hypothesis applies. Changing notation if nec- 
essary (that is, by re-ordering the v’s), we may assume that an 0. For each 
i > 2, define 


Vj = Vj — ana^vi e Span(M 2 , . . . , u„). 

Each v' is a linear combination of the m’s, and the coefficient of u \ is an — 
{cmafl )a\\ = 0. Since m — 1 > n — 1, the inductive hypothesis gives scalars 
bi, . . , ,b m , not all 0, with 


b 2 v' 2 + ■ ■ ■ + b m v' m — 0. 

Rewrite this equation using the definition of i>- : 

(- ^2 hand n) Vl + b 2 v 2 H h b m v m = 0. 

i> 2 

Not all the coefficients are 0, and so iq , . . . , v m is linearly dependent. ■ 

The following familiar fact illustrates the intimate relation between linear 
algebra and systems of linear equations. 

Corollary A.32. If a homogeneous system of linear equations over a field k 
has more unknowns than equations, then it has a non-trivial solution. 

Proof. Recall that an M-tuple , . . . , f} n ) is a solution of a system 

ctnx\ -\ 1- a.\ n x n = 0 

T * * * T tX mn X n — 0 
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if anPi + • • • + oij n fin = 0 for all i . In other words, if ci, . . . , c„ are the 
columns of the m x n coefficient matrix A = [otij\ (note that c; € k m ), then 

fi\C\ + • • • + fi n c n = 0. 

Now k m can be spanned by m vectors (the standard basis, for example). 
Since n > m , by hypothesis, Lemma A.31 shows that the list ci, . . . , c„ is 

linearly dependent; there are scalars yi , • • • , y n , not all zero, with yiCi -I 1- 

y n c„ = 0- Therefore, (y \ ..... y„) i s a nontrivial solution of the system. ■ 

Theorem A.33 (Invariance of Dimension). If X = xi,...,x n and Y = 
y i, , y m are bases of a vector space V, then m = n. 

Proof. If m ^ n , then either n < m or m < n . In the first case, yi, ... ,y m G 
Span(xi, . . . , x n ), because X spans V, and Lemma A.31 gives Y linearly de- 
pendent, a contradiction. A similar contradiction arises if m < n, and so we 
must have m = n. ■ 

It is now permissible to make the following definition, for all bases of a 
vector space have the same size. 

Definition. If V is a finite-dimensional vector space over a field k, then its 
dimension, denoted by dim^(L) or by dim(L), is the number of elements in 
a basis of V . 

Corollary A.34. Let k be a finite field with q elements. IfV is an n-dimensional 
vector space overk, then \ V\ = q n . 

Proof. If i>i , . . . , v n is a basis of V, then every v e V has a unique expression 


v = c im H 1- C n v n , 


where c, e k for all i . There are q choices for each c; , and so there are q n 
vectors in V. ■ 

Example A.35. (i) Example A. 28 shows that k n has dimension n, which 

agrees with our intuition when k = R: the plane Rxlis 2-dimensional, 
and R 3 is 3-dimensional! 

(ii) If V = {0}, then dim(L) = 0, for there are no elements in its basis 0. 
(This is another good reason for defining 0 to be linearly independent.) 

(iii) Let / be a finite set with n elements. Define 

k 1 = {functions / ; / — > k}. 

Now k 1 is a vector space if we define addition / + f to be 

and scalar multiplication af , for a e k and /:/—>• fc, by 


af:i i-^ af (i ) 
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(see Exercise A.18(i) on page 439). It is easy to check that the set of n 
functions of the form f , where i e I, defined by 


fi(j) = 


1 if j = i 
0 if j + i 


form a basis, and so dim^ 7 ) = n = \I\. 

This is not a new example: an n -tuple (ai, ... ,a n ) is really a func- 
tion /: {1 , . . . , n} — > k with /(/) = o, for all i . Thus, the functions f 
comprise the standard basis. ▲ 


Definition. A longest (or maximal) linearly independent list X = u i, ... ,u m 
is a linearly independent list for which there is no vector v G V such that 
u \, . . . , u m , v is linearly independent. 

Lemma A.36. If V is a finite-dimensional vector space, then a longest linearly 
independent list X = v\, . . . ,v n is a basis of V . 

Proof. If t>i , . . . , v n is not a basis, then it does not span V, for this list is lin- 
early independent. Thus, there is w G V with w Span (v \ , . . . , v n ) . But the 
longer list iq, . . . , v n , w is linearly independent, by Proposition A. 25, contra- 
dicting X being a longest linearly independent list. ■ 

The converse of Lemma A.36 is true; bases are longest linearly independent 
lists. This follows from the next proposition, which is quite useful in its own 
right. 

Proposition A.37. Let V be an n -dimensional vector space. IfZ = u \, . . . , u m 
is a linearly independent list in V , where m < n, then Z can be extended to 
a basis ; that is, there are vectors v m +i , . . . , v n e V such that u \, . . . , u m , 
v m +i , . . . , v„ is a basis of V. 


Proof. If the linearly independent list Z does not span V, there is v m +\ e V 
with u m +i ^ Span (m, . . . , w m ), and the longer list m, . . . , u m , v m +\ is lin- 
early independent, by Proposition A.25. If u\ , . . . , u m , u m +i does not span V, 
there is v m +2 £ V with v m +2 £ Span (u\, . . . , u m , v m +\). Since dim(L) = n. 
Lemma A. 31 says that the length of these lists can never exceed n, and so this 
process of adjoining elements v m +\ , v m + 2 , ■ ■ ■ must stop. But the only reason 
a list stops is that it spans V ; hence, it is a basis. ■ 

Corollary A.38. Let V be an n-dimensional vector space. Then a list in V is 
a basis if and only if it is a longest linearly independent list. 

Proof. Lemma A.36 shows that longest linearly independent lists are bases. 
Conversely, if A is a basis, it must be a longest linearly independent list: other- 
wise, Proposition A.37 says we could lengthen X to obtain a basis of V which 
is too long. ■ 

We now paraphrase Lemma A. 31. 

Corollary A.39. If dim(L) = n, then a list of n + 1 or more vectors is linearly 
dependent. 
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Proof. Otherwise, the list could be extended to a basis having too many ele- 
ments. ■ 

Corollary A.40. Let V be a vector space with dim(L) = n. 

(i) A list of n vectors that spans V must be linearly independent. 

(ii) Any linearly independent list ofn vectors must span V. 

In either case , the list is a basis of V . 

Proof, (i) Were the list linearly dependent, then it could be shortened to give 
a basis, and this basis is too small. 

(ii) If the list does not span, then it could be lengthened to give a basis, and 
this basis is too large. ■ 

Corollary A.41. Let U be a subspace of a vector space V of dimension n. 

(i) U is finite-dimensional. 

(ii) dim (U) < dim(L). 

(iii) Ifdim{U) = dim(L), then U = V . 

Proof, (i) Take u \ e U . If U = Span (u i), then U is finite-dimensional. If 
U ^ Span (wi), there is 1/2 Span (mi). By Proposition A.25, Mi, 1/2 is 
linearly independent. If U = Span (u 1 , m 2 ). we are done; if not, there is 
M 3 f. Span (wi, M 2 ), and the list Ui, M 2 , m 3 is linearly independent. This 
process cannot be repeated n + 1 times, for then u 1 , . . . , u n +i would be 
a linearly independent list in U C V, contradicting Corollary A. 39. 

(ii) A basis of U is linearly independent, and so it can be extended to a basis 
of V. 

(iii) If dim(C/) = dim(L), then a basis of U is already a basis of V (otherwise 
it could be extended to a basis of V that would be too large). ■ 

Linear Transformations 

Linear transformations are homomorphisms of vector spaces; they are really 
much more important than vector spaces, but vector spaces are needed in order 
to define them, and bases of vector spaces are needed to describe them by 
matrices. (You are surely familiar with the next definition, at least for k = ]R.) 

Definition. Let V and W be vector spaces over a field k. A linear transfor- 
mation is a function T: V — > W such that, for all vectors v, v' e V and scalars 
a & k, we have 

(i) T(v + v 1 ) = T(v) + T(v') 

(ii) T(av) = aT(v). 

It follows by induction on n > 1 that linear transformations preserve linear 
combinations: 

T{a\V\ -\ 1- a„v„) = a\T(v\) -\ 1- a n T(v n ). 

You’ve certainly seen many examples of linear transformations in a linear 
algebra course. Here are a few more. 
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Example A.42. (i) If A is an m x n matrix, then x i-> Ax, where x is an 

n x 1 column vector, is a linear transformation k" — >■ k m . 

(ii) If we regard the complex numbers C as a 2-dimensional vector space 
over M, then complex conjugation T : z i-> z i s a linear transformation. 

(iii) Let V = k[x\, where k is a field. If a G k, then evaluation e a 'k[x\ — > k 
is a linear transformation (we can view k as a 1 -dimensional vector space 
over itself). 

(iv) Integration / m>- L 1 f (x) dx is a linear transformation C [0, 1] — > R (see 
Example A.20(v)). A 

We now associate matrices to linear transformations. 

Theorem A.43. Let V and W be vector spaces over afield k. If tq , . . . ,v n is a 
basis ofV and W\, . . . ,vu n is a list of elements in W (possibly with repetitions ), 
then there exists a unique linear transformation T : V — > W with T(vj) = Wj 
for all j. 

Proof By Proposition A. 29, every vector v € V has a unique expression as a 
linear combination of basis vectors: 

v = a iiti H h a„ v„. 

Therefore, there is a well-defined function T : V If' with T (vj ) = Wj for 
all j , namely 

T(a\V\ -\ 1- a n v n ) = aiW\ -\ h a n w n . 

It is routine to check that T is a linear transformation. If v' = a \ iq H \-a' n v n , 

then 

v + v' = (a\ + a\)v i -I 1 - (a n + a' n )v n , 

and 

T(v + v') = (a\ + a\)wi -\ 1 - (a n + a' n )w n 

= (a\W\ -( h a n w n ) + (a'juq -I 1- a' n w n ) 

= T(v) + T(v'). 


Similarly, 

T(a v) = T(a(a\Vi + a„v „) ) 

= T(aa i ui + 1- aa„v „ ) 

= aaiwi + • • • + aa n w n 

= a(a\W\ + 1- a n w n ) 

= aT(v). 

To prove uniqueness, suppose that S: V — > IT is a linear transformation 
with S(vj) = Wj for all j . Since S preserves linear combinations, 

S(oiUi -I 1 - a n v n ) = aiS(vi) 4 1- a n S(v n ) 

= a\W\ -\ h a n w n 

= T(a\V\ -\ h a„v„). 


and so S = T . ■ 
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Definition. Let T: V — > W be a linear transformation. Given bases Vi, ... ,v n 
and w i, ... , w m of V and W, respectively, each T(vj) is a linear combination 
of the ur’s: 

T(vj) = a\jW\ -\ b a m jW m . 

The m x n matrix A = [a,-y] whose j th column is a ij , ... . , a m j , the coordinate 
list of T ( vj ) with respect to the in’s, is called the matrix associated to T . 

Example A.44. As in Example A.42(ii), view C as K 2 . A basis is 1 , z ; that 
is, (1, 0), (0, 1). Since 1 = 1 and i = —i = (0, —1), the matrix of complex 
conjugation relative to the basis is 



Of course, the matrix A associated to a linear transformation T : V — > W 
depends on the choices of bases of V and of W . 

The next theorem shows why the notation a\jW\ + • • • + a m j w m is chosen 
instead of aj\W\ + b a jm w m . 

Theorem A.45. If T\k n — > k m is a linear transformation, then 

T(v) = Av, 

where A is the matrix associated to T from the standard bases ofk n and k m , 
v is an nxl column vector, and Av is matrix multiplication. 

Proof. If A is an m x n matrix and Vj is the nxl column vector whose j th 
entry is 1 and whose other entries are 0, then Avj is the j th column of A. Thus, 
Avj = T(vj) for all j.andsoAn = T(v) for all v e k n , by Exercise A. 29 on 
page 441. ■ 

In Appendix A.l, we defined functions to be equal if they have the same 
domain, same target, and same graph. It is natural to require the same domain 
and the same graph, but why should we care about the target? The coming 
discussion gives a (persuasive) reason why targets are important. 

Definition. Let V be a vector space over k, and regard k as a 1 -dimensional 
vector space over itself. A functional on V is a linear transformation 
f\ V — > k. The dual space V * of a vector space V is the set of all functionals 
on V. 

It is shown in Example A.42 that evaluation and integration give rise to 
functionals. 

Proposition A.46. LetT : V —>■ W be a linear transformation, where V andW 
are vector spaces over afield k. 

(i) V* is a vector space over k. 

(ii) If f e W* , then the composite f o T is in V* . 
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Use Proposition A. 2 to 
prove (/ + g) o T = 
f o T + g o T: just 
evaluate both sides on 
veV. 


(iii) The function T*:W* — > V*, given by T*:f i-> / o T , is a linear 
transformation. 


Proof, (i) This is Exercise A. 30 on page 441. 

(ii) Since T: V — > W and /': IT —> k are linear transformations, so is their 
composite / o T : V — > k; that is, / o T is a functional on V . 

(iii) That / b- /or is a linear transformation IT* — > V* follows easily from 
the formulas (/ + g) o T = f o T + g o T and (cf)oT = c(f o T), 
where f g e IT* and c e k (note that cf is a functional on IT, for IT* 
is a vector space). ■ 


Proposition A.47. If v\, ... ,v n is a basis of a vector space V over a field k, 
then there are functionals v*\V — » k, for each j , with 


V*{Vi) 


1 1 ifi = j 
( 0 ifi ± j , 


and v*, . . . . v* is a basis ofV* (it is called the dual basis). 


Proof. By Theorem A. 43, it suffices to prescribe the values of v* on a basis 
ofT. 

Linear Independence: If CjV* = 0, then CjV*(v) = 0 for all v e 
V. But CjV*(vj ) = Cj, so that all the coefficients Cj are 0 and, hence, 
v*, ... ,v* is linearly independent. 

Spanning: If gel 7 *, then g(vj ) = dj e k for all j . But g = dj v* , for 

both sides send each vj to clj . Thus, g is a linear combination of v* v*. 


Corollary A.48. IfV is an n-dimensional vector space, then 
dim(T*) = n = dim(T). 

Proof. A basis of V and its dual basis have the same number of elements. ■ 

If T : V — > IT is a linear transformation, what is a matrix associated to the 
linear transformation T*\W* T*? 

Lemma A.49. Let Vi , . . . , v n be a basis of a vector space V over k. If g e V* , 
then g = d\V* + ■■■ + d n v*, where dj = g(vj) for all j. Therefore, the 
coordinate list of g relative to the dual basis v * , . . . , v* ofV* is d \, . . . , d n . 

Proof. We saw this in the proof of Proposition A.47, when showing that the 
dual basis spans V*. ■ 

The next result shows that dual spaces are intimately related to transposing 
matrices. If A = [<a ;y ] is an m x n matrix, then its transpose A T is the n x m 
matrix [aji\ whose i j entry is aji. In words, for each /, the / throw an , . . . , 
of A is the /th column of A T (and, necessarily, each / th column of A is the 
jth row of A T ). 
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Proposition A.50. IfT : V — > W is a linear transformation and A is the matrix 

ofT arising from bases V\,...,v n ofV and W\ w m ofW, then the matrix 

ofT*: W* — > V* arising from the dual bases is the transpose T t of A. 

Proof. Let B be the matrix associated to T*: IT * — > V* . The recipe for con- 
structing B says that if w* is a basis element, then the i th column of B is the 
coordinate list of T*(w*). Let’s unwind this. First, T*(w*) = w* oT. Second, 
the coordinate list of w* o T is obtained by writing it as a linear combination 
of v\, ... ,v*: Lemma A.49 does this by computing ( w * o T) (vj) for all j . 
Now 

K 0 T ) (w) = w *( T ( v j)) = w * (oijwi H F a mj w m ) = a, 7. 

Thus, the / th column of B is an, ... , ai m ; that is, the /th column of B is the 
/th row of A. In other words, B = A r . ■ 

If T: V — > IT is a linear transformation, then the domain of T* is IT*, 
which depends on the target of T. Suppose that IT is a subspace of a vector 
space U\ let / : IT — > U be the inclusion. Now S = i o T: V — > U is also 
a linear transformation. The transformations T and S have the same domain, 
namely V, and the same graph (for T(v) = S(v) for all v e T); they differ 
only in their targets. Now T*:W* -> V*, while S*: U* -* V*. Since T* and 
S* have different domains, they are certainly different functions, for we have 
agreed that the domain of a function is a necessary ingredient of its definition. 
We conclude that S and T are distinct; that is, if you like transposes of matri- 
ces, then you must admit that targets are essential ingredients of functions. 


Exercises 

A.18 (i) * If k is a field, cel, and / : k — *■ k is a function, define a new function 
cf : k —*■ k by a h* cf(a). With this definition of scalar multiplication, 
prove that the commutative ring k k of all functions on k is a vector space 
over k (see Example A.35(iii)). 

(ii) Prove that Poly(k), the set of all polynomial functions k — > k, is a subspace 
of A:*. 

A.19 If the only subspaces of a vector space V are {0} and V itself, prove that 
dim(T) < 1. 

A.20 Prove, in the presence of all the other axioms in the definition of vector space, 
that the commutative law for vector addition is redundant; that is, if V satisfies 
all the other axioms, then u + v = v + u for all u , v e V. 

Hint. If w, v £ V, evaluate — [(— v) + (— tt)] in two ways. 

A.21 If V is a vector space over F 2 and if v\ f V 2 are nonzero vectors in V, prove 
that vi, V 2 is linearly independent. Is this true for vector spaces over any other 
field? 

A.22 Prove that the columns of an m x n matrix A over a field k are linearly dependent 
in k m if and only if the homogeneous system Ax = 0 has a nontrivial solution. 

A.23 Prove that the list of polynomials 1, x, x 2 , x 3 , . . . , x 100 is a linearly independent 
list in k [x] , where k is a field. 
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A.24 * Let X = vi , . . . , v n be a list in a vector space V, and let Y = yi , . . . , y n be a 
permutation of v \ , . . . , v n . 

(i) Prove that X spans V if and only if Y spans V. 

(ii) Prove that X is linearly independent if and only if Y is linearly independent. 

(iii) Prove that A is a basis of V if and only if Y is a basis of V. 

(iv) Conclude that spanning, being linearly independent, or being a basis are 
properties of a subset of vectors, not merely of a list of vectors. See Corol- 
lary A.23. 

A.25 It is shown in analytic geometry that if l\ and £2 are lines with slopes m \ and 
m 2 , respectively, then l \ and £2 are perpendicular if and only if m\m 2 = —1. If 

ti = {a Vj + Uj : a e R}, 

fori = 1 , 2, prove that m\m 2 = —1 if and only if the dot product v\ ■ V 2 = 0. 
(Since both lines have slopes, neither is vertical.) See Lemma 3.16. 

A.26 (i) In calculus, a line in space passing through a point u is defined as 

{u + aw : a e R} C R 3 , 

where w is a fixed nonzero vector. Show that every line through the origin is 
a one-dimensional subspace of R 3 . 

(ii) In calculus, a plane in space passing through a point u is defined as the subset 
{deR 3 :(i) - u) ■ n = 0} C R 3 , 

where n ^ 0 is a fixed normal vector. Prove that a plane through the origin 
is a two-dimensional subspace of R 3 . 

If the origin (0, 0, 0) lies on a plane H , then u = 0 and 

H = {v = (x, y, z) e R 3 : v ■ n = 0}, 

where n = ( a , /S, y ) is a (nonzero) normal vector; that is, H is the set of all 
vectors orthogonal to n . 

A.27 If U and W are subspaces of a vector space V, define 

U + W = {u + w : u 6 U and w e W}. 

(i) Prove that U + W is a subspace of V. 

(ii) If U and U' are subspaces of a finite-dimensional vector space V, prove that 

dim((7) + dim(C/') = dim([/ n U') + dim([7 + U'). 

Hint. Extend a basis of U fl U' to a basis of U and to a basis of U'. 

A.28 If U and W are vector spaces over a field k, define their direct sum to be the set 
of all ordered pairs. 


TJ ® W = {(u, w) : u € U and w e W}, 


with addition 


(w, w) + (u r , w') = (u + u', w + u/) 
and scalar multiplication 


a(u, w) = (au, a w). 
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(i) Show that U © W is a vector space. 

(ii) If U and W are finite-dimensional vector spaces over a field k, prove that 

dim([7 © W ) = dim([/) + dim(IL). 

A.29 * Let S,T:V -*■ W be linear transformations, where V and W are vector spaces 
over a field k. Prove that if there is a basis vi, ... ,v„ of V for which S(vj) = 
T(vj ) for all j, then S = T . 

A.30 * Prove that the dual space V* of a vector space V over a field k is a vector space 
over k. 

A.4 Inequalities 

Many properties of inequality follow from a few basic properties. Denote the 
set of all positive real numbers by P (we do not regard 0 as positive). We 
assume the set P satisfies 

(i) a , b e P implies a + b g P 

(ii) a,b e P implies ab € P 

(iii) Trichotomy : If a is a number, then exactly one of the following is true: 

a e P, a = 0, -a e P. 

The first two properties say that P is closed under addition and multiplica- 
tion. We now define inequality. 

Definition. Given real numbers a and b, we say that a is less than b, written 
a < b, if b — a e P ; we say that a is less than or equal to b, written a < b, if 
b — a e N; that is, a < b or a = b. 

Thus, a is positive if 0 < a (that is, a G P), and a is negative if a < 0 (that 
is, —a e P ). 

Here are some standard properties of inequality. 

Proposition A.51. Let a, b, B be real numbers with b < B. 

(i) If a > 0, then ab < aB; if a < 0, then ab > aB. 

(ii) If a > 0 and b < 0, then ab < 0. 

(iii) Ifb > 0, then b~ x > 0; ifb < 0, then b~ x < 0. 

(iv) a + b < a + B and a — b > a — B. 

(v) If c, d are positive, then d < c if and only if d > c~ 1 . 

Proof. We prove the first three parts; the last two proofs are similar and appear 
in Exercise A. 31 below. 

(i) By definition, b < B means that B — b € P. 

• Suppose that a > 0; that is, a e P . To show that ab < aB. we must 
show that aB — ab = a(B —b) e P , and this follows from Property (i) 
of P. 

• If a < 0, then — a € P . Therefore, {—a)(B — b) e P, and so 

(— a)(B — b) = (— 1 )a(B — b) = a(b — B) e P. 


Recall that N is the set of 
all nonnegative integers, 
so that N = P U {0}. 


Other notation: if a < b, 
we may write b > a and, 
if b < a, we may write 
a >b. 

Just to complete the 
picture, a > b means 
b < a (and a > b means 
b < a). 
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(ii) The first part says that if b < B and both sides are multiplied by a positive 
number, then the sense of the inequality stays the same. So, if B < 0, then 
aB < a ■ 0 = 0. 

(iii) Suppose that b > 0. If b _1 < 0, then 

1 = b ■ b~ l < b ■ 0 = 0, 
a contradiction. If b~ x = 0 then 

1 = bb~ ] = b- 0 = 0, 

another contradiction. Hence, Trichotomy gives b~ [ >0. ■ 


Exercises 


“Disprove” here means 
“give a concrete counterex- 
ample.” “Salvage” means 
“add a hypotheis to make it 
true.” 


A.31 * Prove parts (iv) and (v) of Proposition A.5 1. 

A.32 Prove, or disprove and salvage if possible. Suppose a , b , c, and d are real num- 
bers. 

(i) If a < b, then a 2 < b 2 . 

(ii) If a 2 < b 2 , then a < b. 

(iii) If a < b and c < d, then ac < bd. 

(iv) If a 3 > 0, then a > 0. 

A.33 Does C have a subset P' like P; that is. P' is closed under addition and multi- 
plication, and it satisfies Trichotomy? 


We do not assume that 
* is commutative; that is, 
a * b b * a is allowed. 


A.5 Generalized Associativity 

Recall that a set with a binary operation is a set G equipped with a function 
GxG^- G; we denote the value of the function by (a, b) i-> a * b. Examples 
of such sets are the real numbers and the complex numbers, each of which 
is usually viewed as having two binary operations: addition (a, b) a + b 
and multiplication (a, b) ab. More generally, every commutative ring has 
binary operations addition and multiplication. Another example is given by 
G = X x , the family of all functions from a set X to itself: composition of 
functions is a binary operation on G. 

The adjective binary means two: two elements a,b e G are combined to 
produce the element a *b e G. But it is often necessary to combine more than 
two elements: for example, we may have to multiply several numbers. The 
binary operations in the examples cited above are associative ; we can combine 
three elements unambiguously. If a, b, c e G, then 

a * (b * c) = (a * b) * c. 

Since we are told only how to combine two elements, there is a choice when 
confronted with three elements: first combine b and c, obtaining b * c, and 
then combine this new element with a to get a * {b * c), or first get a * b 
and then combine it with c to get (a * b) * c. Associativity says that either 
choice yields the same element of G. Thus, there is no confusion in writing 
a * b * c without parentheses. In contrast, subtraction is not associative, for it 
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is not clear whether a—b — c means (a — b) — c or a — (b — c ), and these may 
be different: 9 — (5 — 3) = 7 while (9 — 5) — 3 = 1. 

Suppose we want to combine more than three elements; must we assume 
more complicated identities? Consider powers of real numbers, for example. 
Is it obvious that a? a 2 = (a [aa 2 ]) ai The remarkable fact is: assuming we 
don’t need parentheses for three factors, we don’t need parentheses for more 
than three factors. To make all concrete, we now call a binary operation mul- 
tiplication, and we simplify notation by omitting * and writing ab instead of 
a * b. 

Definition. Let G be a set with a binary operation; an expression in G is an 
n -tuple (a i , ai, . . . , a n ) e G x • • • x G that is rewritten as CI 1 CI 2 • • • a n \ we call 
the a i factors of the expression. 

An expression yields many elements of G by the following procedure. 
Choose two adjacent a' s, multiply them, and obtain an expression with n — 1 
factors: the new product just formed and n — 2 original factors. In the shorter 
new expression, choose two adjacent factors (either an original pair or an orig- 
inal one together with the new product from the first step) and multiply them. 
Repeat this procedure until there is a penultimate expression having only two 
factors; multiply them and obtain an element of G that we call an ultimate 
product. For example, consider the expression abed . We may first multiply 
ab, obtaining (ab)cd , an expression with three factors, namely, ab, c, d . We 
may now choose either the pair c, d or the pair ab, c; in either case, multi- 
ply them to obtain expressions having two factors: ab, cd, or ( ab)c , cl . The 
two factors in the last expressions can now be multiplied to give two ultimate 
products from abed, namely ( ab)(cd ) and (( ab)c)d . Other ultimate products 
derived from the expression abed arise from multiplying be or cd as the first 
step. It is not obvious whether the ultimate products from a given expression 
are equal. 

Definition. Let G be a set with a binary operation. An expression ci\a 2 - — a, , 
in G needs no parentheses if all its ultimate products are equal elements of G . 

Theorem A.52 (Generalized Associativity). If G is a set with an associative 
binary operation, then every expression a 1 02 • • • a n in G needs no parentheses. 

Proof. The proof is by induction on n > 3. The base step holds because the 
operation is associative. For the inductive step, consider two ultimate prod- 
ucts U and V obtained from a given expression a\ a 2 • • • a n after two series of 
choices: 

U = (ai ■ ■ •a;)(a;+i • • -a n ) and V = (ai ■■■a J )(aj+i ■■■a n ); 

the parentheses indicate the penultimate products displaying the last two fac- 
tors that multiply to give U and V, respectively; there are many parentheses 
inside each of the shorter expressions. We may assume that i < j . Since each 
of the four expressions in parentheses has fewer than n factors, the inductive 
hypothesis says that each of them needs no parentheses. It follows that U = V 
if i = j. If i < j, then the inductive hypothesis allows the first expression to 
be rewritten as 


U = (a 1 — a,) ([a,-+i ■■■a j ][a j+ i ■■■a n ]) 
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and the second to be rewritten as 

V = ([ai---ai][ai+i---aj])(aj+i---a n ), 

where each of the expressions ci\ ■ ■ ■ a, . a/+ \ ••• cij , and cij + 1 • • • a„ needs no 
parentheses. Thus, the three expressions yield unique elements A, B , and C in 
G, respectively. The first expression gives U = A ( B C ) in G, the second gives 
V = ( AB)C in G, and so U = V in G, by associativity. ■ 

Corollary A.53. If G is a set with a binary operation , a G G, and m, n > 1, 
then 


a m+n = a m a n and (a m ) n = a mn . 

Proof. In the first case, both elements arise from the expression having m + n 
factors each equal to a; in the second case, both elements arise from the ex- 
pression having inn factors each equal to a. ■ 


Computer Algebra is 
used regularly in many 
high school classrooms, 
implemented either on 
handheld devices or with 
tablet apps. 


A.6 A Cyclotomic Integer Calculator 

Several times in the previous chapters, we’ve advised you to use a CAS. Some 
uses are simply to reduce the computational overhead of algebraic calculations, 
such as the expansion in Lagrange interpolation on page 272. For this, you can 
use the CAS “right out of the box:” all the functionality you need is built in 
with commands like expand or simplify. 

Other applications require programming that uses specific syntax for the 
CAS in use. A good example is the formula in Example 6.61 on page 265. 
The recursive formula for 0„ can be implemented in almost any CAS, but the 
details for how to get a product over the divisors of an integer (especially if 
the product is in the denominator of an expression) can either be trivial, if the 
functionality is built-in, or extremely tricky to implement if it is not. There are 
many CAS environments, so it would be of little use to include actual programs 
here. 

What we can do in this Appendix is point out how to use Proposition 7.20 
on page 290 to model Q(a), where a is algebraic over Q and its minimal 
polynomial p = irr (ct, Q) is known. The essential piece of that Proposition is 
that 


Q(a) = QM Kp(x))\ 

so, as long as your CAS can find the remainder when one polynomial is divided 
by another, you can use it to perform “modular arithmetic” with polynomials 
in Q(a). 

If p is a prime in Z and £ = cos( 27 t/ p) + i sin(27r/ p), we know that 

irr(£, Q) = 1 + x + x 2 H b x p ~ x . 

This is easily implemented in a CAS with something like 

Phi(x,p):= sum (x"k, k, 0,p-l) 

Suppose that your CAS command for polynomial remainder is pmod. For ex- 
ample. 
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pmod (x~3+4x~2-3x+l, x~2+l) 

returns —4x — 3, the remainder when x 3 + 4x 2 —> 3x + 1 is divided by x 2 + 1 . 

The two functions, Phi and pmod, allow us to calculate in Q(£). Let’s look 
at some examples. 

Eisenstein Integers 

Arithmetic with complex numbers is built into most CAS environments, so that 
you can do calculations with Gaussian integers right away. Arithmetic with 
Eisenstein integers isn’t usually built in, but you can build a model of Z [o>\ by 
thinking of an Eisenstein integer as a congruence class mod x 2 + x + 1: 

cl (f ) := pmod (f,phi(x,3)) 

or even 

cl (f ) := pmod (f, x"'2+x+l) . 

Addition and multiplication of classes are defined as in Chapter 7: 
add(f,g) = cl(f+g) 
mult(f,g) = cl(fg). 

So, now we can compute: to find, for example, 3<w 5 — w 2 + 1, you want the 
class of 3x 5 — x 2 + 1 mod x 2 + x + 1 

cl (3x‘5-x”2+l) 

> -2x-l 

And, sure enough, 

3w 5 — w 2 + 1 = — 1 — 2 a). 

Your model can do generic calculations, giving the rules for addition and mul- 
tiplication in Z [w] : 

add (a+b*x, c+d*x) 

> a+c+ (b+d) *x 

mult (a+b*x, c+d*x) 

> a*c-b*d + (a*d+b*c-b*d) *x 


You can generate Eisenstein triples by squaring Eisenstein integers: 

mult (3+2*x, 3+2*x) 

> 5+8*x 


mult (5+x, 5+x) 
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> 24+9*x 

mult (4+3*x, 4+3*x) 

> 7+15*x 

Symmetric Polynomials 

In Example 9.6 on page 388, we derived the cubic formula via symmetric 
polynomials. There, we defined 

S = a\ + 0?2 (10 + 0?3 co 2 
U = a 1 + 0?2ft> 2 + 0?3 ft>, 


and we saw that 5 + u = 3a. 
We also claimed that 


S 3 + w 3 = 210L\0L20Lt,. 

Our CAS Eisenstein calculator can help. Replacing a, y by a, b, c, we have 

add ( (a+b*x+c*x"2 ) "3, (a+b*x“2+c*x) "3) 

> 2*a’'3-3*a"2*b+a''2*c-3*a*b''2-12*a**b*c-3*a*c''2+2* 
b''3-3*b''2*c-3*b*c'’2+2*c''3 


factor (2*a“3-3*a“2*b+a''2*c-3*a*b''2-12*a**b*c-3*a* 
c"2+2*b"3-3*b''2*c-3*b*c''2+2*c''3) 

> (a+b-2*c) * (a-2*b+c) * (2*a-b-c) 

This can be written (with an eye to symmetric polynomials) as 

(a + b + c — 3 c)(a + b + c — 3b)(3a — (a + b + c)). 

Since a + b + c = 0, this is exactly what we claimed in Example 9.6. 

Algebra with Periods 

One last example: in Section 7.3, we outlined Gauss’s construction of the reg- 
ular 17-gon with ruler and compass. Central to that is the specification of “pe- 
riods” of various lengths, listed on page 323. They are constructed according 
to the formula on page 325: if ef = 16, the periods of length / are given by 


7=0 

mod is the CAS built-in A CAS model follows the syntax pretty closely: 

“mod” function. 

n (e, k) : =sum (x~ (mod ( 3 “ (k+e* j ) , 17 ) ) , j , 0, ( 16/e) -1 ) . 
Now change cl so that it gives the congruence class mod <J>i7(x): 
cl (f) := pmod (f, phi (x, 17)), 
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and we can calculate with the classes of the periods: 

n (2, 0) 

> x'16+x'15+x''13+x''9+x''8+x'') +x~2+x 

n (2, 1) 

> x'14+x'12+x''ll+x''10+x''7+x''6+x''5+x''3 

cl ( add (n (2,0) , n (2, 1) ) ) 

>-l 

cl ( mult (n(2,0), n(2, 1)) ) 

> -4 

So, as we claimed on page 323, r] 2,0 and r? 2 ,t are roots of 

x 2 + x-4. 

Exercises 

A.34 Find a polynomial in Q[.v] that has roots r ) 0 il3,k' r l4,k- 
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